Closed simonw closed 2 years ago
I'm going to grab the output and write it to packages-by-system/name-of-package.json
I think this will do it:
for package in $(echo 'datasette-query-history dogsheep-beta dogsheep-photos evernote-to-sqlite genome-to-sqlite')
do
curl -s https://pypistats.org/api/packages/$package/system | jq > packages-by-system/$package.json
sleep 10
done
But with this instead of that echo
:
curl -s 'https://datasette.io/content.json?sql=select%20group_concat(substr(nameWithOwner%2C%20instr(nameWithOwner%2C%20%27%2F%27)%20%2B%201)%2C%20%27%20%27)%20from%20datasette_repos%3B&_shape=arrayfirst' | jq '.[0]' -r
It needs to skip datasette-app
as that isn't on PyPI and the 404 may break it.
It's running now - will probably take about 1hr20m to finish since it was taking 40hr previously and now it's grabbing an extra file for each package. https://github.com/simonw/package-stats/actions/runs/2012607365
Script now looks like this: https://github.com/simonw/package-stats/blob/6dd54ed7fcc4534dcb7f2085101315001bcf660a/.github/workflows/fetch_stats.yml#L30-L45
I updated the SQL query to this one:
select group_concat(substr(nameWithOwner, instr(nameWithOwner, '/') + 1), ' ') from datasette_repos
where nameWithOwner != 'simonw/datasette-app'
Concerning the problem of CI systems inflating stats, https://twitter.com/minimaxir/status/1505598747971006465 suggests:
Available from https://pypistats.org/api/packages/datasette/system