igrigorik / gharchive.org

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.
https://www.gharchive.org
MIT License
2.67k stars 207 forks source link

Incoherent pull request events #229

Open marcelosousa opened 4 years ago

marcelosousa commented 4 years ago

On executing the following on bigquery:

SELECT
  name, login, prNum, created_at, action
FROM (
  SELECT
    repo.name,
    actor.login,
    JSON_EXTRACT(payload,'$.action') AS action,
    JSON_EXTRACT(payload,'$.number') AS prNum,
    JSON_EXTRACT(payload,'$.pull_request.created_at') AS created_at,
  FROM
    `githubarchive.month.202002`
  WHERE
    type = 'PullRequestEvent')
WHERE
  name = 'willycornelissen/willycornelissen.github.io' AND
  prNum = '1'

The result list is:

[
  {
    "name": "willycornelissen/willycornelissen.github.io",
    "login": "dependabot[bot]",
    "prNum": "1",
    "created_at": "\"2020-02-06T17:18:37Z\"",
    "action": "\"opened\""
  },
  {
    "name": "willycornelissen/willycornelissen.github.io",
    "login": "dependabot[bot]",
    "prNum": "1",
    "created_at": "\"2020-02-06T17:47:18Z\"",
    "action": "\"opened\""
  },
  {
    "name": "willycornelissen/willycornelissen.github.io",
    "login": "dependabot[bot]",
    "prNum": "1",
    "created_at": "\"2020-02-08T21:05:00Z\"",
    "action": "\"opened\""
  }
]

I don't see how this is possible because it would mean that the same pull request would be created in multiple events at different times. Issuing a call to the github api (https://api.github.com/repos/willycornelissen/willycornelissen.github.io/pulls/1) we can see that the latest event seems okay.

Can someone shed some light on what could be happening?

asad-awadia commented 2 years ago

https://www.githubstatus.com/history?page=7

might be because of the incident @marcelosousa