pyOpenSci / pyosMeta

A package that updates pyOpenSci contributor and package metadata on our website
BSD 3-Clause "New" or "Revised" License
4 stars 17 forks source link

[bug / feature] : challenges pulling all review issues from our pyos meta package #192

Closed lwasser closed 1 month ago

lwasser commented 1 month ago

Right now i'm working on our dashboard for peer review metrics.

if we run this code to collect all issues:

from pyosmeta import ProcessIssues
from pyosmeta.github_api import GitHubAPI

# Get all issues from GitHub software-submission repo, Return df with labels, title, date_opened and closed and total time open in days
github_api = GitHubAPI(
    org="pyopensci",
    repo="software-submission",
)

process_review = ProcessIssues(github_api)
issues = process_review.get_issues()
accepted_reviews, errors = process_review.parse_issues(issues)

It errors at a string/split step (i haven't had time to fully debug yet). There are going to often be times when we have issues with odd characters in them. In this case because it's parsing ALL issues, there are likely some issues that are not related to peer review that it's getting stuck on OR it's getting stuck on presubmissions, etc.

There are a few options here

  1. We always use a label to grab issues or we provide a list of labels (reasonable).
  2. We allow the parses to fail gracefully on an issue, and report back the issue numbers of what it skips.

because i haven't looked into this more, i'm not sure what the best approach is at this point. But i wanted to log the issue as i work on peer-review-metrics!

lwasser commented 1 month ago

another example for presubmissions. this used to work, but now it returns a partial set of values in the ReviewModel object

github_api = GitHubAPI(
    org="pyopensci", repo="software-submission", labels=["presubmission"]
)

process_review = ProcessIssues(github_api)
issues = process_review.get_issues()
presubmissions, errors = process_review.parse_issues(issues)

i suspect the problem might be that we have fine tined the API to grab normal reviews. but we have a second template that is a pre-submission and also some issues that are not reviews at all.

lwasser commented 1 month ago

Thinking about this more, because our package should support lists of labels, perhaps the labels= parameter is required and has a default value? this would prevent someone from trying to pull all issues.

Then we would always specify atleast one label. we can then update the docstring accordingly that if a label is not specified it will grab accepted reviews only as a default value. i think this might make the most sense