igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
https://www.gharchive.org
MIT License
2.71k stars 208 forks source link

PullRequestEvent doesn't contain review_requested action entry #279

Open tisonkun opened 2 years ago

tisonkun commented 2 years ago

https://docs.github.com/en/developers/webhooks-and-events/events/github-event-types describes the PullRequestEvent's actions can be one of: opened, edited, closed, reopened, assigned, unassigned, review_requested, review_request_removed, labeled, unlabeled, and synchronize.

However, when analyzing GHArchive data it gives:

SELECT
    action,
    COUNT(1)
FROM github_events
WHERE event_type = 'PullRequestEvent'
GROUP BY action;

┌─action──────┬───count()─┐
│ opened      │ 226861706 │
│ closed      │ 172356866 │
│ reopened    │   1924348 │
│ labeled     │        24 │
│ synchronize │    453125 │
│ merged      │         4 │
└─────────────┴───────────┘

There're no review_requested or review_requested_removed.

I don't know whether it's no such event from the API or the crawler doesn't handle it.

tisonkun commented 2 years ago

Also labeled and merged data are catched once but later don't occur.

tisonkun commented 2 years ago

Actually, after 2015 there're only events with action 'opened', 'closed', 'reopened'. Others occurred before 2012, and labeled occurred in some days in 2017.

frank-zsy commented 1 year ago

@tisonkun It is not an issue about GHArchive, actually GitHub webhook events and timeline events are different although they share the same data schema. I think it is maybe for cost concern, labeled, assigned and other actions are not included in timeline events so we will not get the data in GitHub events log.