opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
71 stars 16 forks source link

Discussion: GH Archive has missing data #816

Open ravenac95 opened 8 months ago

ravenac95 commented 8 months ago

What is it?

As @ccerv1 discovered, there are pockets of dates that are simply missing data (this applies to all events). On top of that, the commit data that we derive from PushEvents seem to be off for some projects when we look at the old style "Collector" collected data. We need to decide how we want to move forward with this information. I have some thoughts, but I'm trying to capture the scope of the problem in this issue.

Background information:

Issues with PushEvent

Options

@ccerv1 and I discussed the options that we have for moving forward. I'd also like to try some things to ensure we've looked at all of the possible commit data in github.

Things to explore

ravenac95 commented 6 months ago

This needs to be broken down into actionable tasks