jstrieb / github-stats

Better GitHub statistics images for your profile, with stats from private repos too
GNU General Public License v3.0
2.89k stars 612 forks source link

Stats don't account for "old" contributions #13

Open BitPatty opened 3 years ago

BitPatty commented 3 years ago

The query used to get the users contributions apparently only includes repositories to which the user has recently contributed to.

{
  viewer {
    repositoriesContributedTo(first: 100, includeUserRepositories: false, contributionTypes: [COMMIT, PULL_REQUEST, REPOSITORY, PULL_REQUEST_REVIEW]) {
      nodes {
        nameWithOwner
      }
    }
  }
}

Tested via https://docs.github.com/en/free-pro-team@latest/graphql/overview/explorer

It appears that older contributions have to be queried seperately somehow, for example by scraping the users profile, by abusing the search api or via third party tools such as BigQuery: https://stackoverflow.com/a/63427144

jstrieb commented 3 years ago

Hi, thanks for using the project and taking the time to open this issue!

I'm afraid I don't completely understand the problem you mention. As far as I can tell, I have implemented the API query using pagination (via after) so that if there are more than 100 results, it will continue to loop through until there are none left. It should do this using the GraphQL query here (in particular line 153):

https://github.com/jstrieb/github-stats/blob/a7478a4aea37643cd5d40026ea4bc7b39592f49c/github_stats.py#L140-L154

Have you been finding that it is not working properly? Or has there been some other misunderstanding? I would appreciate more information so that I can better address this. Thanks!

BitPatty commented 3 years ago

It's not an issue with your code rather than limitations of the Github API itself. Your query generally works fine, however, if you last contributed to a repository you don't own more than ~ 1 year ago it won't show up in the response.

Sample Query for BigQuery:

SELECT distinct repo.name
FROM (
  SELECT * FROM `githubarchive.year.2019`
)
WHERE (type = 'PushEvent' 
  OR type = 'PullRequestEvent')
  AND actor.login = 'BitPatty'

In this case the following repository will show up: https://github.com/zenware/FizzBuzz which has some contributions from my side.

However, on the GraphQL API this repository doesn't show up, since my last contribution was back in 2019.

Github API response:

"nodes": [
  {
    "nameWithOwner": "vendure-ecommerce/vendure"
  },
  {
    "nameWithOwner": "kimeggler/spotifystatistics"
  },
  {
    "nameWithOwner": "HelveticSpeedrunners/speedrun.ch"
  },
  {
    "nameWithOwner": "swisscom/backman"
  },
  {
    "nameWithOwner": "dizzypenguins/Bonobo"
  }
]
jstrieb commented 3 years ago

Thanks for the clarification! If I'm understanding correctly, there isn't much I can do about this without potentially making a lot of queries to the REST API. Even then, I am not sure that would totally address the problem, given that there are sometimes weird inaccuracies.

Do you think that adding a note to the second paragraph of the disclaimer referring to this specific issue is sufficient to make users aware of the problem? If not, how would you go about fixing it?

BitPatty commented 3 years ago

Yes, it's certainly a huge effort to adjust the logic for this specific issue. I might get working on it myself if I find enough time to do so - but not in the near future.

Do you think that adding a note to the second paragraph of the disclaimer referring to this specific issue is sufficient to make users aware of the problem? If not, how would you go about fixing it?

Updating the docs would definitely help future users which might be as confused as I was at the beginning about the missing contributions.

In the end, this issue is more of a "FYI" than something I'd want to be "fixed" asap.

jstrieb commented 3 years ago

That makes sense, thanks! I've mentioned this in the appropriate place in the README and linked back to this issue, which I will leave open. Once again, I appreciate you bringing this to my attention.