hackforla / data-science

The Hack For LA Data Science team is a Community of Practice within the LA brigade seeking to make analytical and machine learning services available to local communities and organizations.
26 stars 15 forks source link

CoP: Data Science: Survey: Repo Labels #160

Closed Neecolaa closed 1 year ago

Neecolaa commented 2 years ago

Overview

We need to survey labels across the organization so that we can rationalize and do automation and org-wide audits.

Additional Details

We already have an automation running on the github.com/hackforla/website repo that adds labels that start with "missing:" and let the user know what other labels are required. The user can still add optional labels, but they must use the minimum. We want to roll this automation out to all the teams, but in order to do so, they must all be uses the minimum labeling in the same way.

We have a kanban guide, but it's confusing to users if all the projects don't use similar labels, so we want to have a base set of labels that will be documented in our instructions.

Action Items

Resources/Instructions

Orgs to poll

Data Schema

Resources

JasonEb commented 2 years ago

Github Project REST API : https://docs.github.com/en/rest/projects

rbianchetti commented 2 years ago

Tks for sharing the API doc, I'll check it out!

On Thu, 26 May 2022 at 20:30, Jason E @.***> wrote:

Github Project REST API : https://docs.github.com/en/rest/projects

— Reply to this email directly, view it on GitHub https://github.com/hackforla/data-science/issues/160#issuecomment-1139244778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHPIMAUNRCJRVEGMKQZQRVTVMA6VVANCNFSM5VMSAVOA . You are receiving this because you were assigned.Message ID: @.***>

rbianchetti commented 2 years ago

I've created a script that, given a set of organizations, gets all the repos --> issues --> labels.

Then, it exports the data, writing one row per each label per issue.

The output of the script is attached:

combined_csv.csv

The next step is to do analysis by appearance.

Thanks,

rbianchetti commented 2 years ago

A Jupyter notebook has been pushed to 160-survey-repo-labels branch with the analysis of the most popular labels.

JasonEb commented 2 years ago

https://github.com/hackforla/data-science/tree/160-survey-repo-labels/labels-survey Apparent survey results

akhaleghi commented 1 year ago

@rbianchetti ~hey, I noticed that you had the output for this script in a csv file. Can we have these put into a Google Sheets document so that other teams can more readily access the data through a web browser?~ I created the folder and copied the csv into a file

rbianchetti commented 1 year ago

@rbianchetti ~hey, I noticed that you had the output for this script in a csv file. Can we have these put into a Google Sheets document so that other teams can more readily access the data through a web browser?~ I created the folder and copied the csv into a file

Thanks! I've added the Jupyter notebook with the analysis to the folder: https://drive.google.com/file/d/1pV4heWqOvzwrj1JxxzcCIIlJqI5N68K9/view?usp=sharing

akhaleghi commented 1 year ago

Thanks so much Ren!

@rbianchetti @codemamma One thing that will be helpful is if we could get the script to break down counts of unique labels like the highlighted column in this document I wrote a script and used it on just the size labels to show what we need.

Basically, we want to do a groupby on the output from the script. Manisha, is this something you'd want to do?

codemamma commented 1 year ago

Thank you Abe! I will work on it. I have also split the labelname column and will do some cleanup on the csv file (output file ) for projecting the label usage question( which has repetitive label names) for now.

Manisha

On Tue, Jul 12, 2022 at 8:28 PM Abe @.***> wrote:

Thanks so much Ren!

@rbianchetti https://github.com/rbianchetti @codemamma https://github.com/codemamma One thing that will be helpful is if we could get the script to break down counts of unique labels like the highlighted column in this document https://docs.google.com/spreadsheets/d/1DV9Q0DeIFqSQyQb-QEX1B7gHijNmh8Ue4OVKgf16ZYM/edit#gid=244294355 I wrote a script and used it on just the size labels to show what we need.

Basically, we want to do a groupby on the output from the script. Manisha, is this something you'd want to do?

— Reply to this email directly, view it on GitHub https://github.com/hackforla/data-science/issues/160#issuecomment-1182723976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXRRS3ZF5ERVE5ER7HVENYDVTYZXTANCNFSM5VMSAVOA . You are receiving this because you were mentioned.Message ID: @.***>

codemamma commented 1 year ago

Attached, please find the csv file you requested. Sorry for the delay because of family trips, I was out of town. Let me know if this looks right to you!

On Tue, Jul 12, 2022 at 8:53 PM Manisha Yadav @.***> wrote:

Thank you Abe! I will work on it. I have also split the labelname column and will do some cleanup on the csv file (output file ) for projecting the label usage question( which has repetitive label names) for now.

Manisha

On Tue, Jul 12, 2022 at 8:28 PM Abe @.***> wrote:

Thanks so much Ren!

@rbianchetti https://github.com/rbianchetti @codemamma https://github.com/codemamma One thing that will be helpful is if we could get the script to break down counts of unique labels like the highlighted column in this document https://docs.google.com/spreadsheets/d/1DV9Q0DeIFqSQyQb-QEX1B7gHijNmh8Ue4OVKgf16ZYM/edit#gid=244294355 I wrote a script and used it on just the size labels to show what we need.

Basically, we want to do a groupby on the output from the script. Manisha, is this something you'd want to do?

— Reply to this email directly, view it on GitHub https://github.com/hackforla/data-science/issues/160#issuecomment-1182723976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXRRS3ZF5ERVE5ER7HVENYDVTYZXTANCNFSM5VMSAVOA . You are receiving this because you were mentioned.Message ID: @.***>

akhaleghi commented 1 year ago

@codemamma Hi Manisha, where did you put the csv files?

codemamma commented 1 year ago

size_label.csv role_label.csv p-feature_label.csv feature_label.csv

codemamma commented 1 year ago

I am trying to upload the csv files on shared drive but I do not have access so, requested the access.

codemamma commented 1 year ago

@codemamma Hi Manisha, where did you put the csv files? https://drive.google.com/drive/u/0/folders/1l8bMfhmUPG1O3nCLgvIE7jd_HlqAIb1s

akhaleghi commented 1 year ago

Thanks @codemamma

akhaleghi commented 1 year ago

@codemama One more question. Does the GitHub API allow you to obtain the hex code of the color of the labels?

codemamma commented 1 year ago

@codemama One more question. Does the GitHub API allow you to obtain the hex code of the color of the labels?

No.

akhaleghi commented 1 year ago

@ExperimentsInHonesty Please see combined_csv. The feature, p-feature, size, and role labels are separated into different sheets and ready for the Ops team to analyze. @codemama there are still some label types (i.e. "epic: " "guide: " etc.) that we need to look at. Is there a single csv that has the complete list of labels that is output from the script?

codemamma commented 1 year ago

@ExperimentsInHonesty Please see combined_csv. The feature, p-feature, size, and role labels are separated into different sheets and ready for the Ops team to analyze. @codemama there are still some label types (i.e. "epic: " "guide: " etc.) that we need to look at. Is there a single csv that has the complete list of labels that is output from the script?

Yes @akhaleghi. I tried to push the notebook yesterday along with all the csv files and there is distinct labels csv( list of all 326 labels )and can be combined to get more refined results. I had authentication problem while pushing so,I am coordinating with @salice to do it tonight.

codemamma commented 1 year ago

label.csv here is single csv with all labels @akhaleghi

akhaleghi commented 1 year ago

@codemama thanks Manisha! So one more thing, this csv seems to capture only labels that have colon-separated names (i.e. "size: 2pt", "role: data science", etc) but excludes labels that aren't in that format. For example, in the Data Science repo we have labels for "bug" "dependency" and "enhancement" that don't appear on the list, and I suppose either "Label" or "LabelStatus" would be blank for those. Is it possible to get those labels as well, perhaps in a different CSV document with one fewer column?

codemamma commented 1 year ago

@codemama thanks Manisha! So one more thing, this csv seems to capture only labels that have colon-separated names (i.e. "size: 2pt", "role: data science", etc) but excludes labels that aren't in that format. For example, in the Data Science repo we have labels for "bug" "dependency" and "enhancement" that don't appear on the list, and I suppose either "Label" or "LabelStatus" would be blank for those. Is it possible to get those labels as well, perhaps in a different CSV document with one fewer column?

I can check and see if If I can filter it out @akhaleghi

akhaleghi commented 1 year ago

@codemama were you able to filter out those existing labels?

akhaleghi commented 1 year ago

Hi @codemamma, are there any recent updates to this issue?

akhaleghi commented 1 year ago

To do: Determine if all labels are included in output from 8/2

@codemamma I am moving this back to the backlog but if you'd like to work on it more please let me know

codemamma commented 1 year ago

@akhaleghi, sorry for delayed response. I had trouble with logging in github. I pushed the final notebook 22 days ago with final changes for review. Can you please check if the file is accessible.

ExperimentsInHonesty commented 1 year ago

@akhaleghi what is the next step here? Is there a demo of the data I need to see, will a slide be made, what?

akhaleghi commented 1 year ago

@ExperimentsInHonesty I believe the next step is for Ops to review the output and decide on standards for using the labels based on the issue here I'll drop a note in that issue but it looks like no one is assigned to it so not sure if it will be seen.

ExperimentsInHonesty commented 1 year ago

@akhaleghi @jossus657

We on the ops team are really looking forward to helping you with this issue, but we are finding it cumbersome to navigate through all the comments in order to find the google sheet we should be looking at. Please have someone from the DS team take the time to review this issue and all its comments and summarize the links in the top part of the issue and then let us know by dropping a note in the ops channel with a link to this issue.

jossus657 commented 1 year ago

Updates 1/26

jossus657 commented 1 year ago

Updates 2/16