pyOpenSci / peer-review-metrics

A repo where we collect peer review metrics
https://www.pyopensci.org/peer-review-metrics
0 stars 4 forks source link

Data structure for metrics #39

Closed lwasser closed 3 weeks ago

lwasser commented 3 weeks ago

I'm creating a notebook that parses issue and pr history. It would be good to store the data somewhere to reduce api calls

Something like this:

[
    {
        "issue_id": 12345,
        "created_at": "2024-01-15T10:34:23Z",
        "state": "open",
        "labels": ["bug", "high priority"],
        "title": "Example issue title",
        "user_id": 98765
    },
    {
        "issue_id": 12346,
        "created_at": "2024-02-10T08:22:10Z",
        "state": "closed",
        "labels": ["enhancement"],
        "title": "Another example issue",
        "user_id": 98766
    },
    ...
]

Then stored like this :

data/
│
├── 2018/
│   ├── repo1_issues.json
│   └── repo2_issues.json
│
├── 2019/
│   ├── repo1_issues.json
│   └── repo2_issues.json
│
├── 2020/
│   ├── repo1_issues.json
│   └── repo2_issues.json
│
└── 2024/
    ├── repo1_issues.json
    └── repo2_issues.json

Could work well. Locally I could process older years. Then ci could run once a month to update the current year.

lwasser commented 3 weeks ago

I've changed my mind after running the workflow. 🙃

Instead, I created a CI workflow that captures issues and PRs across all of our reports for the current year. I also used that same workflow to parse all the problems from 2019 to the present (most reports were created in 2022). That file containing the data from 2019-2023 is in the _data/ directory.

I've setup a monthly cron job with the last job running on new years eve to ensure data for the full year. Then it will make a new file In 2025 to capture that data.

This generally minimizes API calls. The problem is that when you add a since= to the rest query, it captures updates, not just newly opened items. So, I had to do a bit of extra processing to support his workflow, but it does work now. It will update just like the contributor workflow on our website does.

The script running in this repo could be moved as an entry point to post meta. But that will be a new issue once we know this works. We are closing this issue for now.