open-sauced / app

🍕 Insights into your entire open source ecosystem.
https://pizza.new
Apache License 2.0
427 stars 226 forks source link

Bug: YOLO Coders is incorrect #3852

Open zanieb opened 3 months ago

zanieb commented 3 months ago

Describe the bug

e.g. at https://app.opensauced.pizza/s/astral-sh/uv?hideBots=false

We have reports of people pushing to main:

Screenshot 2024-08-05 at 12 07 28 PM

But this is just a squash merged pull request:

Screenshot 2024-08-05 at 12 08 03 PM Screenshot 2024-08-05 at 12 09 05 PM

Steps to reproduce

See description above.

github-actions[bot] commented 3 months ago

Thanks for the issue, our team will look into it as soon as possible! If you would like to work on this issue, please wait for us to decide if it's ready. The issue will be ready to work on once we remove the "needs triage" label.

To claim an issue that does not have the "needs triage" label, please leave a comment that says ".take". If you have any questions, please comment on this issue.

For full info on how to contribute, please check out our contributors guide.

jpmcb commented 3 months ago

👀 thanks for reporting this! I'll look into this right away!

jpmcb commented 3 months ago

It seems we're missing abit of data in our events data lake: how this works is we consume the GitHub events firehose off of api.github.com/events at scale, drop that into a time series database, and use that to correlate closed PRs with merge commit shas. The piece that's missing it seems is the closed event that we'd use to correlate the 4bc36c0cb85ac2b5efbc9c2bfecad6137666e908 sha on astral-sh/uv's main branch with the closed PR event's pr_merge_commit_sha in the data.

It seems we're missing data from ~July 8th, ~July 16th, and July 17th.

There aren't any disruptions on our side when consuming the events firehose. But, my best guess, it seems that there were GitHub disruptions those days around the time those PRs were merged:

We've been aware that at times the GitHub API can be unstable but this is the first I've seen it drop this much data. We'll have to think of some kind of fallback mechanism for how we want to approach this in our API and services.

This was super helpful @zanieb - thanks so much for surfacing this!


cc @isabensusan, until we narrow down a better fix for this, we should considered updating the copy in the Yolo coders section to note that this data may need to be manually verified