scikit-hep / scikit-hep-orgstats

Stats gathering tools for SciKit-HEP PyPI releases
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Statistics aggregator for the Scikit-HEP packages

This is an admin-focused repository collecting scripts and material to look at statistics for the org packages. The present tools collect and display the PyPI statistics of all org packages (and a few other friends).

Rendered Jupyter notebooks for Python 2 vs. 3: Table and Plot.


Warning: grabbing the last 2-3 years of data can use about $50 in cloud credits. The Google Big Query script is best run in a virtual environment:

python3 -m venv .env
. .env/bin/activate
pip install -r requirements.txt

You will have to set up your credentials as described here or here.


Then, you can run the download script:

./ -c ~/google-api-key.json

(Either set GOOGLE_APPLICATION_CREDENTIALS or use the parameter shown above to set your API key file.)

You can use ./ --help to see options.

Each package release contains the latest snapshots attached, which can be used to run several analyses, see below. Refer to the releases page.


You can run ./ to produce the final plots. Use --help to see usage instructions:

./ --help

  -f, --filename FILENAME    Files to read in (defaults to all CSVs)
  -n, --name TEXT            Add prefix to all plots
  -m, --minor                Use minor version too
  -p, --package TEXT         Select only one package instead of all
  -x, --filter-package TEXT  Remove package(s) from package list
  --unique                   Filter based on OS "uniqueness"
  --filter <TEXT TEXT>...    Filter based on KEY VALUE
  --help                     Show this message and exit.

  all   Make a comparison by projects
  freq  Make a frequency plot over a key
  main  Make a weekly or daily downloads plot

You can run multiple commands and often give options multiple times. main, freq, and all, support --key KEY, though they choose a nice default. The two "per-week" plots also support a --daily flag to change weekly into daily statistics.

Most of the commands don't do much to generate a custom name if you change options, so you can use the --name option to set a prefix. If you want minor versions to not be combined with major ones, pass --minor. You can list multiple --package NAME and --filename NAME options; otherwise they default to all.


A simple comparison of projects with frequency plots:

./ --name 20210101_ -x scikit-optimize all main freq

Filtering on only Switzerland:

./ -n CH_ -x uproot -x awkward --filter country_code CH all

To look at packages pre-dating the Scikit-HEP project and compare with the uproot series:

./ \
  -p iminuit -p rootpy \
  -p root_numpy -p root_pandas \
  -p uproot \
  -p uproot4 \
./ \
  -p pyjet \
  -p particle \
  -p hepunits \
  -p numpythia \
  -p decaylanguage \