metrumresearchgroup / ghpm

BSD 3-Clause Clear License
1 stars 0 forks source link

Associated issues of a given pull request #13

Open Blackglade opened 4 years ago

Blackglade commented 4 years ago

It might be helpful letting a user now what are the associates issues for a specific pull request. (ie: I am scraping some data about pull request X and I want to know what issues either were included in this Pull request or referenced by it.)

There are multiple ways of going about this with several considerations of how people associate an issue with a pull request.

Here are the following ways I can think of:

Scraping this require some forethought (the last one in particular).

For first two we can drill down from a specific PR via CONNECTED_EVENT to an Issue node. The last one would require scraping through all the commits for another CONNECTED_EVENT of an issue of a pull request.

The function was going to have the user specify a specific pull request to get all issue data from but it might also equally important to just directly incorporate this into the get_repo_issues function and just have a column that's a sub-list of all associated PULL requests.

Thoughts?

dpastoor commented 4 years ago

For first two we can drill down from a specific PR via CONNECTED_EVENT to an Issue node. The last one would require scraping through all the commits for another CONNECTED_EVENT of an issue of a pull request.

Have you actually checked on the CONNECTED_EVENT conditions empirically? One thing I noticed, is at least it didn't differentiate between if you linked to an issue from a PR or linked from and issue to a PR - the CONNECTED_EVENT was the same. This was somewhat unexpected, but made life easier.

I wonder if maybe, wonderfully, those references in the comments would show up making life easier.

The other thing to consider, is the context could be lost when connecting things inside the comments. Eg the difference between "this PR addresses #999" vs "please don't do it like we did in #666" - one is a reference to actionability, the other is just a casual linkage.

I think if we start with formal CONNECTED_EVENT then we might be able to provide a user-land wrapper or otherwise to help think through what it would take to create an entire connectivity graph.

Blackglade commented 4 years ago

oh boy...

okay after some investigating, ConnectedEvent even seems to be inconsistent.

For reference I ran this query:

query test {
   repository(owner: "metrumresearchgroup", name: "rbabylon") {
    pullRequest(number: 104){
      title
      timelineItems(itemTypes: [CONNECTED_EVENT] last: 100){
        nodes {
          __typename
         ...on ConnectedEvent {
            subject {
              ...on Issue {
                title
                number
              }
            }
          }
        }
      }
    }
  }
}

I was looking at two PR's from rbabylon, PR 99 and PR 104:

104: https://github.com/metrumresearchgroup/rbabylon/pull/104

PR 104 Mentions ISSUE 91 in the body but it does not appear in any connected event.

99: https://github.com/metrumresearchgroup/rbabylon/pull/99

PR 99 Mentions ISSUE 92 in the body and ISSUE 94 as a comment to the PR.

Only issue 92 shows up in the query request despite it being in the body and issue 91 not showing up for PullRequest 104

This doesn't make any sense... gonna have to contact Github about this again.

Blackglade commented 4 years ago

Created an issue for this: https://github.community/t/inconsistency-in-pull-request-timeline-events-connected-event/121749