rora00 / toy-dataset-ranking

Estimates dataset usage for common toy datasets in Python and R using Github Search API
MIT License
0 stars 0 forks source link

Differences between data collected by Github API and by Github Search UI #2

Open rora00 opened 22 hours ago

rora00 commented 22 hours ago

Github UI search: language:python content:"sklearn.datasets" AND content:"{dataset}" and Github API search: sklearn.datasets {dataset} extension:py returns vastly different results. Although the ranking is approximately similar the search results are different by an order of around 10x

rora00 commented 7 hours ago

It could have something to do with number of results per page but not sure.