opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
71 stars 16 forks source link

Code Metrics Star/Fork counts are reporting incorrectly for some #1781

Closed ravenac95 closed 2 months ago

ravenac95 commented 2 months ago

Which area(s) are affected? (leave empty if unsure)

No response

To Reproduce

This was from a user in the discord (thanks daehyun from discord!)

This query:

SELECT *
FROM `oso_production.code_metrics_by_project_v1`
WHERE project_name = 'sonobe-privacy-scaling-explorations'

Returns the following data

But the numbers that we have in both the raw data from github (the cloudquery github resolver) and also the data on the UI in github shows that those star / fork counts are 162 and 38 respectively.

Describe the Bug

See above

Expected Behavior

The numbers should match the github data and add up correctly based on all the repos in a project.

ravenac95 commented 2 months ago

The issue seems to be somewhere in int_repo_metrics_by_project:

SELECT * 
FROM `opensource-observer.oso.int_repo_metrics_by_project` 
WHERE project_id = 'YIX6II_0b1iDDkzlII1s4Nek-cFYoFCweXAnbjrN62w='

This returns 4 copies of the project. I imagine that gets aggregated somewhere and that causes this issue.

The project_id above is the project id of the sonobe-privacy-scaling-explorations project.

ravenac95 commented 2 months ago

This doesn't seem to be happening on all things. If you search for opensource-observer's project (Erx9J64anc8oSeN-wDKm0sojJf8ONrFVYbQ7GFnqSyc=) it's just fine.

baumstern commented 2 months ago

zk-eigentrust has 110 stars and 10 forks but the query to the oso_production in BigQuery returns 220 stars and 20 forks count. It's project_name is zk-eigentrust-privacy-scaling-explorations