Open kimberlytanyh opened 1 year ago
After finish drafting this issue, add the label "Ready for Product".
@kimberlytanyh Add a step to add data to a google sheet on the Team Google Drive. Add a link to the folder it will go in, under the resources section.
Weekly Update:
Weekly Update:
@kimberlytanyh we are in the process of changing the labels on issues currently labeled Complexity: Good second issue
to good first issue
Complexity: Good second issue
Issue or PRgood first issues
instead of first grouping good first issue
and Complexity: Good second issue
together@ExperimentsInHonesty Thank you for the heads up! I will adjust my code for the next round of analysis rerun accordingly.
Weekly Update:
Progress: Identified means for identifying pull requests in retrieved issues through GitHub API. Will re-perform all analyses done and try to improve accuracy of datasets. Blockers: None Availability: Saturday ETA: ~6 hours
@ExperimentsInHonesty As discussed in the Sunday Team Meeting, below are the labels to be added to prework/tracking issues for better data analysis:
Team Member Progression
Weekly Update:
Progress: Looked into data pipeline options to automate data updates for visualizations in Looker. Blockers: Discussing preferred approach Availability: Friday-Sunday this week ETA: 6-10 hours
Weekly Update:
Progress: Working on Streamlining Data Cleaning Code in Jupyter Notebook and adding in automation components. Going to try using Google Sheet API to create data source for Looker Dashboard. Blockers: Automating and scheduling notebook to run automatically. Deciding on best data source for Looker (in the midst of scheduling a working session with Chelsey, Karina, and Sophie).
Availability: Mon, Friday-Sunday next week ETA: 6-10 hours
Weekly Update:
Progress: Created repository to establish automation of running Python Data Cleaning script using GitHub Actions with Sophia, Chelsey, and Karina. Next step is to clean up existing code for automation and data accuracy, try Google Sheets API and establishing data source for Looker.
Concepts/ tools used for setting up daily running of Python code cleaning script automatically (in case want to set up wiki in the future):
Blockers:
Availability: Weekend and Mon-Fri next week, 12PM -7PM ETA: 15+ hours
@kimberlytanyh
Please add update using the below template (even if you have a pull request). Afterwards, remove the 'To Update !' label and add the 'Status: Updated' label.
If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.
You are receiving this comment because your last comment was before Tuesday, May 30, 2023 at 12:15 AM PST.
@kimberlytanyh
Please add update using the below template (even if you have a pull request). Afterwards, remove the '2 weeks inactive' label and add the 'Status: Updated' label.
If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.
You are receiving this comment because your last comment was before Tuesday, June 6, 2023 at 12:16 AM PST.
Progress: In the process of changing one more section of the code for automation and double checking accuracy of data after cleaning (need to improve accuracy of crediting the right amount of small issues for agenda issues that have multiple assignees). Next step is to add the Python script for automation and clean and create dataset for the live dashboard on number of issues available.
Blockers: None yet. Availability: 6-8 hours ETA: A few more weeks since it is an evolving and ongoing issue.
@kimberlytanyh
Please add update using the below template (even if you have a pull request). Afterwards, remove the '2 weeks inactive' label and add the 'Status: Updated' label.
If you need help, be sure to either: 1) place your issue in the developer meeting discussion column and ask for help at your next meeting, 2) put a "Status: Help Wanted" label on your issue and pull request, or 3) put up a request for assistance on the #hfla-site channel. Please note that including your questions in the issue comments- along with screenshots, if applicable- will help us to help you. Here and here are examples of well-formed questions.
You are receiving this comment because your last comment was before Tuesday, June 27, 2023 at 12:17 AM PST.
Progress: Completed documentation of process for live issue availability dashboard (for GitHub class). Left to do: Edit Python script to add in data from other columns, add it to repository for automation, and finish creating dashboard. Blockers: None yet. Might have to consult Data Science COP about auto running automation script. Availability: 21 hours next week Mon-Fri. ETA: By next week or two.
Dependency
4921. Resume when the dashboard is ready
Overview
We need to collect data on the authors of all the prework issues in our repository to perform data analysis.
Action Items
[x] Find URL for GitHub REST API documentation and add it to the resources below
[x] Read relevant sections in GitHub API documentation on retrieving data with REST APIs
[x] Search for other resources on platforms or libraries and syntax to use to retrieve data with GitHub REST APIs
[x] Download Postman to retrieve needed JSON data via GitHub REST API (based on online tutorials)
[x] Read documentation on rate limiting
[x] Retrieve data on all prework issues (date range from Nov 1, 2021 to now) using REST API in Jupyter Notebook
[x] Put JSON data in a tabular format and clean data
[x] Get distribution of issues completed by each complexity level for each prework author: Put data in columns: GitHub Handle, Date Prework Closed, No. of Good First Issues Completed, No. of Good Second Issues Completed, No. of Small Complexity Issues Completed, No. of Medium Complexity Issues Completed, No. of Large Complexity Issues Completed
[x] Export data as Excel file and add to Google Drive folder (GitHub Data Analysis)
[x] Manually check accuracy of numbers in dataset/spreadsheet)
[x] Write documentation on process and considerations (not complete yet)
[x] Duplicate data in another spreadsheet and perform following analysis:
[x] Perform above analysis again on only closed prework issues.
[x] Clean data and get number and percentage of closed large issues that were unassigned in Google Sheets
[x] Create Google spreadsheet with list of issues that have more than one complexity label and unassigned closed large issues.
[x] Perform cohort analysis on closed prework authors
[x] Research how to connect data to Looker Studio in a way that new data can come in and Looker visualizations are automatically updated.
[x] Create new repository with Sophia and Chelsey's help that has GitHub Actions that perform cron job so that Python script can be run automatically daily for fresh data.
[ ] Add automation components to Python script and verify data cleaning accuracy.
[x] Create Looker dashboard with data pulled in.
[ ] Refine the Looker dashboard so that it is more intuitive
[ ] Investigate correlation between number of issues available and cohort performance:
Might be separated into another issue
Resources/Instructions