kubernetes-sigs / release-team-shadow-stats

Kubernetes release team shadow program application analysis
Apache License 2.0
14 stars 6 forks source link

Which diagrams can be made publicly available in a report - consult contribex #7

Closed leonardpahlke closed 1 year ago

leonardpahlke commented 2 years ago

To increase transparency about and improve the release-team's shadow program, we would like to make some information publicly available (in a report format at the start of each release cycle). See the examples/plots folder for examples of the information that should be included in a report. Because we have a responsibility to ensure that applicants entrust us with their information, we must ensure that we do not share personal information. This needs to be coordinated with contribex.

What we need to clarify with contribex:

see tracking issue: https://github.com/kubernetes-sigs/release-team-shadow-stats/issues/4

leonardpahlke commented 2 years ago

cc k/sig-contribex chairs @alisondy @mrbobbytables cc k/sig-release chairs @justaugustus @saschagrunert

mrbobbytables commented 2 years ago

I don't see anything too bad in there if those are the only types of reports being generated, but to be sure I'd ask the applicants to opt-in to sharing their aggregated data publicly and list which fields would be included in that.

/assign @jberkus

@jberkus do you have any thoughts or see anything that might be an issue?

leonardpahlke commented 2 years ago

but to be sure I'd ask the applicants to opt-in to sharing their aggregated data publicly and list which fields would be included in that.

yes! see: https://github.com/kubernetes-sigs/release-team-shadow-stats/issues/10

jberkus commented 2 years ago

Are we asking about past shadows, or future applicants? Because those are two different situations.

leonardpahlke commented 2 years ago

For future applicants

jberkus commented 2 years ago

OK. My comments below. We'll also want to have CoCC take a look to see if there are potential diversity issues.

Also, we're going to have some new questions in the new shadow application, so we'll want charts for those.

leonardpahlke commented 2 years ago

Also, we're going to have some new questions in the new shadow application, so we'll want charts for those.

+1

We'll also want to have CoCC take a look to see if there are potential diversity issues.

cc CoCC Members: @celestehorgan @cpanato @karenhchu @palnabarun @vllry

jberkus commented 2 years ago

tag @reylejano to follow

reylejano commented 2 years ago

In my opinion, we should only store/publish the anonymized reports on GitHub e.g. https://github.com/kubernetes-sigs/release-team-shadow-stats/tree/main/examples/plots

My concern is that we currently have non-anonymized data in https://github.com/kubernetes-sigs/release-team-shadow-stats/tree/main/examples/applicants

cc: @jeremyrickard @cici37 @JamesLaverack @jrsapi

jberkus commented 2 years ago

Yeah, so we couldn't put that data into GitHub.

jeremyrickard commented 2 years ago

The data in https://github.com/kubernetes-sigs/release-team-shadow-stats/tree/main/examples/applicants is just fake data isn't it?

100% agree what we should only store/publish anonymized reports and should not store things like the examples data (which i am still assuming is just example fake data)

leonardpahlke commented 2 years ago

My concern is that we currently have non-anonymized data in examples/applicants

There are no plans to make these md files publicly available. These files are generated for a different usecase. They should only be made available to the release team role leads to help reading through the shadow application files - they are not part of the report. Only the diagrams/plots should be released.

@jeremyrickard yes, it's dummy / fake data :)

reylejano commented 2 years ago

@leonardpahlke Thanks for clearing up about the data in https://github.com/kubernetes-sigs/release-team-shadow-stats/tree/main/examples/applicants

Perhaps we should add a small README.md in there that states the data is fake

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jberkus commented 2 years ago

@leonardpahlke what are we currently doing on this report?

leonardpahlke commented 2 years ago

@jberkus Currently, most efforts are on hold. I have been doing some work behind the scenes to test out some libraries, but that is not yet ready to get pushed. I will have time in a bit over a week to continue with this and close some more issues. (still planning to wrap it all up in 1.25)

leonardpahlke commented 2 years ago

/remove-lifecycle stale

leonardpahlke commented 2 years ago

@jberkus The charts for 1.26 are ready (see examples with dummy data https://github.com/kubernetes-sigs/release-team-shadow-stats/tree/main/examples/plots). If an applicant was not opted in, the data in the charts will be ignored. If there are two or fewer applicants per category (e.g., 2 applicants who chose the US East time zone), the data in the charts will be ignored - US East will not appear in this case (see the concerns mentioned earlier).

leonardpahlke commented 2 years ago

Is there any concern about sharing this information (the charts) in a report with the community?

jberkus commented 2 years ago

No, that was the purpose of the charts and why we got a release. The only data that was concerning (pronouns) has been removed.

Some specific critique on the current versions of the charts:

Two more general comments:

  1. It would be useful if each chart also had a table of numerical data
  2. One of the goals here is to compare who applied with who was accepted. So all of these charts need to filter applied/accepted
jberkus commented 2 years ago

Finally: I'd like a data dump to be available on request with PII removed. Because there's other ad-hoc manipulations that folks might want to do.

leonardpahlke commented 2 years ago

Thanks, @jberkus, for your feedback & input :)

  • able to attend burndown meetings: the labels are getting cut off, and aren't understandible
  • able to attend release team meetings: same
  • previous roles: what even is this question? You'll need to expand the title so it's clear what's being measured.

Agree, formatting & rephrasing is needed here 👍

timezone: this pie chart is still really hard to make any sense of, particularly since applicants were able to fill in ad-hoc entries. We'd need to normalize the data, ...

Right, normalizing data is something we should improve in future shadow application forms, this would make many things a lot easier :D (added to the 1.26 retro Agenda).

... and then probably render it in a bar chart in time zone order, for it to be useful

Yes, I also discussed this with @palnabarun (selecting the right chart to transport the message). I was thinking about a diagram in map / earth format for the timezone, the option with bar charts is also good (maybe a bit easier to implement, too).

It would be useful if each chart also had a table of numerical data

Interesting idea, I can see that this view can be useful. Perhaps this can be placed in the report next to the chart.

One of the goals here is to compare who applied with who was accepted. So all of these charts need to filter applied/accepted

We don't have this information yet, because we "only" process the application form, the shadow selection comes afterwards (so we need manual input to display this information in charts).

Question/concern: do we really want to display this? How might this look in a chart?

IMHO, processing of accepted applicants could be something for the future. We can already see the "status of the shadow application program" from the above charts (and maybe some similar charts), which is the goal of the report (at least so far).

I'd like a data dump to be available on request with PII removed. Because there's other ad-hoc manipulations that folks might want to do.

Are you thinking of the raw data which is used to generate the diagrams? (JSON format or smth similar). We could publish this alongside the report. +1

jberkus commented 2 years ago

Right, normalizing data is something we should improve in future shadow application forms, this would make many things a lot easier :D (added to the 1.26 retro Agenda).

You should modify the form also. But in the meantime we should clean up the data by hand.

IMHO, processing of accepted applicants could be something for the future. We can already see the "status of the shadow application program" from the above charts (and maybe some similar charts), which is the goal of the report (at least so far).

Sure, this will have to be hand-entered. Shouldn't take that long. But the primary reason to analyze this data in the first place was to check for selection bias, no?

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

leonardpahlke commented 1 year ago

/remove-lifecycle stale

leonardpahlke commented 1 year ago

fix labels are getting cut off: https://github.com/kubernetes-sigs/release-team-shadow-stats/pull/19

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 year ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/release-team-shadow-stats/issues/7#issuecomment-1546709559): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.