Create a column summary for legacy sprint project in python

mikeaintworkin commented 1 year ago

Create a column summary using pandas on data returned from the legacy sprint project. Display the data in the same way that we measure it for sprints.

Note: If we trim the closed issues column down from around 500 to 100 or so this will run fairly quickly. Right now it takes around 3 minutes to run. We can trim that column by removing older closed issues from the project.

context: This is part of the closeout where I get all the code into python. This functionality will be needed as a bridge between now and when we transition to projectV2 objects for the sprint.

mikeaintworkin commented 1 year ago

Before I started this issue.

The code retrieves all the issues and PRs from the legacy sprint project.
Can print out the data
Can save the data to a tabular form.

learned some pandas using chatgpt and the docs
converted the code to use pandas
cleaned up sloppy code around forming the output file path and name
did a little experimenting with pandas - like how hard would it be to ... export to xml. wow. easy - but didn't work so commented it out for now
thought through the next step logic on paper.

Step 1

Create columns
- date
- sprint name
- InTheSprint
- ready for sprint
- This sprint
- WIP
- Ready for review
- Review
- Ready for QA
- QA

Assign sizes to PRs

If this is anytime after the initial kickoff count (which happens only once)
- Find the issues that are tied to any PRs that do not have a size and use that size

Create InTheSprint count

Sum up:
- Sum of these columns
  - This sprint
  - WIP
  - Ready for review
  - Review

Create the rest of the sums

ready for sprint
This sprint
WIP
Ready for review
Review
Ready for QA
QA

Output the results as a flat file

mikeaintworkin commented 1 year ago

I have the code already for summarizing each column. It will not be hard to get from there to summarizing accross a few columns. The new code will be the tracking down of the attached PRs.

This will have to be dealt with even when the transition is made to the GitHub projectV2 so it's not wasted work.

My thought is that in graphql I can get closingIssuesReferences from a pullrequest

I am going to limit the scope of this issue to what is laid out so far.

It is really tempting to go and parse the labels., etc but I've already done that and if I did it for this code, which is formatted differently it would be eventually thrown away anyway.

It would be easy to join this data with data pulled from the backlog but I can't assume every single item will be in the backlog. Can i?

I guess I could for issues. Any issue added to the sprint without going through the backlog process is worth is worth flagging and looking at. Those are exceptions.
PRs without sizes are not exceptions though, as long as they can be associated back to an issue that is in the sprint.

It's outside the scope of this work to look for exceptions. Those checks can be done as other utilities.

for example - a report that looks accross the repositories and identifies issues and prs that were closed outside of the sprint. -There's also the question of whether or not doing that check is worth the trouble.
either way - out of scope for this. The data will be assumed to be clean.

mikeaintworkin commented 1 year ago

I ended up re-parsing the Labels because I lready had the code so it was just cut/paste. I left off working on SprintSummaryFrame.
This creates a table "Column, size" It will get me back to where I was with the java code we had. After that I will need to do the sizing for the collection of columns that identify active sprint columns. That's not hard. The part I haven't played with yet is chasing the PRs back to their issues for sizes.

I left off in: _add_size_column debugging. -I'm using old code and I am running into dumb column header problems. -I'm thinking maybe rather than keep hitting every reference I might define a tuple with all the column names and use them that way. dunno. getting late heading home.

Code that is checked in works. This code is not yet checked in because - well - it doesn't.

mikeaintworkin commented 1 year ago

I won't be able to look at this again until tomorrow evening.

mikeaintworkin commented 1 year ago

I'm going to pass in the headers I care about as a list

mikeaintworkin commented 1 year ago

Basic hacky code is checked in. Each of the sprint columns is summarized. The columns that represent activity included in the points score for the sprint are summarized under "Active Sprint"

370 ▶ SPRINT READY
109 This Sprint 🏃‍♀️ 🏃 
146 IQSS Team - In Progress  💻
56  Ready for Review ⏩
33  Review 🔎
12  Ready for QA ⏩
20  QA ✅
897 Done 🚀
344 ActiveSprint

Does not include the lookup to get the points associated with issues that are off the board, replaced by unsized PRs. The next step is to look for the PR data and add that. After that I need to step back and see where I'm at. I'm getting more and more into the swing of Python coding using ChatGPT as a reference for how or why to do things. I need to go back and make sure I'm not going off the workflow and that I can apply what I am developing directly to the new sprint project.

mikeaintworkin commented 1 year ago

Technically I think I've actually finished this issue and should open another.

mikeaintworkin commented 1 year ago

I realized that I have one more piece to do here. I need to output the results to a text file.

mikeaintworkin commented 1 year ago

I went far afield from simply getting the printing to work. Today

I took a side trip in the TDD and learned the basics.
I thought through the workflow a bit.
Realized that I don't want objects that do everything.
experimented with how I would deal with the needed consitency in column names.
Played with querying GitHub legacy project for only closed issues last touched less than 3 months ago. I still ended up with a loop. I don't want to change how I go into the api. org->project->cards->issue|pr and it seems like the cards object has no way to query it using PyGithub that allows you to limit things at the server side.
I want to do more TDD but I don't want to spend all my time figuring out how to setup testing for the API related stuff so I will stick to things where I'm working with a local file.

[x] Make the output saved from the summary save in exactly the format required here.
[ ] Add the logic to track down the pull requests.

mikeaintworkin commented 1 year ago

Today:

I ran a mostly succesful workflow using: mn-snapshot_sprint_from_api.py
Worked on SprintSummaryFrame so that it returns (dummy) data for the additional columns that I want.
Ran into issues that I'm trying to track down.
To make that easier I moved over to a new worfklow mn-snapshot_sprint_from_file.py
I'm reading in the results of the mn-snapshot_sprint_from_api.py workflow.
This way I can avoid calling the API repeetedley while working through this issue.
I haven't figured out how I want to to replace the dummy data for date and sprint in the summary report. That can wait until after I get this bug fixed and the pulll request logic figured out.
The whole concept of how the python project is setup clicked today. I broke out my workflows into a separate python package so that things are much neater now.
I may further break out the things I'm using as utility functions from the things I have as objects but that's really low priority.
Starting to feel at home in pycharm and python. wahoo.

next steps:

[ ] step backwards & complete the debug of SprintSummaryFrame using mn-snapshot_sprint_from_file.py
[ ] Finish the previous next steps

The flow using the API right now.

    df = pdio.LegacyProjectCards(
        access_token=auth_token_val,
        organization_name=args.organization_name,
        project_name=args.proj_name)
    df.fetch_data()
    df.print_project_cards()
    pdio.write_dataframe(df=df.dataframe()) # raw data
    df.print_project_cards()
    dfsum = pdio.SprintSummaryFrame(df_in=df.dataframe())

The flow using an input file right now

    df = pdio.DFFromFile(
        in_dir=args.dir_name,
        file_name=args.filename)
    dfsum = pdio.SprintSummaryFrame(df_in=df.dataframe())

mikeaintworkin commented 1 year ago

I'm going to close this. I was able to run this workflow tonight:

    df = pdio.DFFromFile(
        in_dir=args.dir_name,
        file_name=args.file_name)
    dfSzer = pdio.SprintCardSizer(df_in=df.dataframe())
    df =dfSzer.dataframe()
    pdio.write_dataframe(df)
    spsumrzr = pdio.SprintSizeSummarizer(df_in=df)
    print(spsumrzr.sprint_summary_line())

and get this result:

Column  Size
▶ SPRINT READY  370
This Sprint 🏃‍♀️ 🏃  46
IQSS Team - In Progress  💻  159
Ready for Review ⏩  53
In Review 🔎 36
QA ✅    6
Done 🚀  623
ActiveSprint    294

▶ SPRINT READY This Sprint 🏃‍♀️ 🏃 IQSS Team - In Progress  💻 Ready for Review ⏩ In Review 🔎 QA ✅ Done 🚀 ActiveSprint      DateTime  SprintName
           370                 46                        159                 53          36    6    623          294 datetimestamp sprint_name

mikeaintworkin commented 1 year ago

I'm breaking out the logic to add the pull requests to a new issue.

thisaintwork / iqss_gh_reporting

Create a column summary for legacy sprint project in python #5