simonw / package-stats

Download statistics for my PyPI packages
7 stars 1 forks source link

Grab stats by system (Linux/Mac/Windows) too #6

Closed simonw closed 2 years ago

simonw commented 2 years ago

Concerning the problem of CI systems inflating stats, https://twitter.com/minimaxir/status/1505598747971006465 suggests:

A workaround would be looking at trends for only Win/Mac downloads (which I think PyPI can segment) but that can be misleading too.

Available from https://pypistats.org/api/packages/datasette/system

simonw commented 2 years ago

I'm going to grab the output and write it to packages-by-system/name-of-package.json

simonw commented 2 years ago

I think this will do it:

for package in $(echo 'datasette-query-history dogsheep-beta dogsheep-photos evernote-to-sqlite genome-to-sqlite')
do
  curl -s https://pypistats.org/api/packages/$package/system | jq > packages-by-system/$package.json
  sleep 10
done

But with this instead of that echo:

curl -s 'https://datasette.io/content.json?sql=select%20group_concat(substr(nameWithOwner%2C%20instr(nameWithOwner%2C%20%27%2F%27)%20%2B%201)%2C%20%27%20%27)%20from%20datasette_repos%3B&_shape=arrayfirst' | jq '.[0]' -r

simonw commented 2 years ago

It needs to skip datasette-app as that isn't on PyPI and the 404 may break it.

simonw commented 2 years ago

It's running now - will probably take about 1hr20m to finish since it was taking 40hr previously and now it's grabbing an extra file for each package. https://github.com/simonw/package-stats/actions/runs/2012607365

simonw commented 2 years ago

Script now looks like this: https://github.com/simonw/package-stats/blob/6dd54ed7fcc4534dcb7f2085101315001bcf660a/.github/workflows/fetch_stats.yml#L30-L45

I updated the SQL query to this one:

select group_concat(substr(nameWithOwner, instr(nameWithOwner, '/') + 1), ' ') from datasette_repos
where nameWithOwner != 'simonw/datasette-app'

https://datasette.io/content?sql=select+group_concat(substr(nameWithOwner%2C+instr(nameWithOwner%2C+%27%2F%27)+%2B+1)%2C+%27+%27)+from+datasette_repos%0D%0Awhere+nameWithOwner+!%3D+%27simonw%2Fdatasette-app%27

As JSON: https://datasette.io/content.json?sql=select+group_concat(substr(nameWithOwner%2C+instr(nameWithOwner%2C+%27%2F%27)+%2B+1)%2C+%27+%27)+from+datasette_repos%0D%0Awhere+nameWithOwner+!%3D+%27simonw%2Fdatasette-app%27&_shape=arrayfirst