.. image:: https://img.shields.io/pypi/v/pypi-download-stats.svg :target: https://pypi.python.org/pypi/pypi-download-stats :alt: PyPi package version
.. image:: https://img.shields.io/github/forks/jantman/pypi-download-stats.svg :alt: GitHub Forks :target: https://github.com/jantman/pypi-download-stats/network
.. image:: https://img.shields.io/github/issues/jantman/pypi-download-stats.svg :alt: GitHub Open Issues :target: https://github.com/jantman/pypi-download-stats/issues
.. image:: https://secure.travis-ci.org/jantman/pypi-download-stats.png?branch=master :target: http://travis-ci.org/jantman/pypi-download-stats :alt: travis-ci for master branch
.. image:: https://readthedocs.org/projects/pypi-download-stats/badge/?version=latest :target: https://readthedocs.org/projects/pypi-download-stats/?badge=latest :alt: sphinx documentation for latest release
.. image:: https://www.repostatus.org/badges/latest/abandoned.svg :alt: Project Status: Abandoned – Initial development has started, but there has not yet been a stable, usable release; the project has been abandoned and the author(s) do not intend on continuing development. :target: https://www.repostatus.org/#abandoned
A quick Google search indicates that there are now multiple websites, such as PyPiStats.org <https://pypistats.org/>
_ that provide this information. As such, I'm abandoning this project.
This package retrieves download statistics from Google BigQuery for one or more
PyPI <https://pypi.python.org/pypi>
packages, caches them locally, and then
generates download count badges as well as an HTML page of raw data and graphs
(generated by bokeh <http://bokeh.pydata.org/en/latest/>
). It's intended to
be run on a schedule (i.e. daily) and have the results uploaded somewhere.
It would certainly be nice to make this into a real service (and some extension points for that have been included), but at the moment I have neither the time to dedicate to that, the money to cover some sort of hosting and bandwidth, nor the desire to handle how to architect this for over 85,000 projects as opposed to my few.
Hopefully stats like these will eventually end up in the official PyPI; see
warehouse #699 <https://github.com/pypa/warehouse/issues/699>
,
#188 <https://github.com/pypa/warehouse/issues/188>
and
#787 <https://github.com/pypa/warehouse/issues/787>
_ for reference on that work.
For the time being, I want to (a) give myself a way to get simple download stats
and badges like the old PyPI legacy (downloads per day, week and month) as well
as (b) enable some higher-granularity analysis.
Note that this is a relatively heavy-weight solution; it has many dependencies and is really intended for people whose main need is to generate detailed historical graphs and download count badges for their projects. If your really just want to perform some ad-hoc queries, counts, or simple data analysis on the PyPI downloads dataset, a project like Ofek's pypinfo <https://github.com/ofek/pypinfo>
_ would be a simpler alternative.
Also note this package is very young; I wrote it as an evening/weekend project, hoping to only take a few days on it. Though writing this makes me want to bathe immediately, it has no tests. If people start using it, I'll change that.
For a live example of exactly how the output looks, you can see the download
stats page for my awslimitchecker project, generated by a cronjob on my desktop,
at: http://jantman-personal-public.s3-website-us-east-1.amazonaws.com/pypi-stats/awslimitchecker/index.html <http://jantman-personal-public.s3-website-us-east-1.amazonaws.com/pypi-stats/awslimitchecker/index.html>
_.
Sometime in February 2016, download stats <https://bitbucket.org/pypa/pypi/issues/396/download-stats-have-stopped-working-again>
stopped working on pypi.python.org. As I later learned, what we currently (August 2016)
know as pypi is really the pypi-legacy <https://github.com/pypa/pypi-legacy>
codebase,
and is far from a stable hands-off service. The small team of interpid souls <https://caremad.io/2016/05/powering-pypi/>
who keep it running have their hands full simply keeping it online, while also working
on its replacement, warehouse <https://github.com/pypa/warehouse>
(which as of August 2016 is available online
at https://pypi.io/ <https://pypi.io/>
_). While the actual pypi.python.org web UI hasn't been
switched over to the warehouse code yet (it's still under development), the current Warehouse
service does provide full access to pypi. It's completely understandable that, given all this
and the "life support" status of the legacy pypi codebase, download stats in a legacy codebase
are their last concern.
However, current download statistics (actually the raw log information) since January 22, 2016
are available in a Google BigQuery public dataset <https://mail.python.org/pipermail/distutils-sig/2016-May/028986.html>
_
and being updated in near-real-time. There may be download statistics functionality
VirtualEnv <http://www.virtualenv.org/>
_ and pip
(recommended installation method; your OS/distribution should have packages for these)pypi-download-stats relies on bokeh <http://bokeh.pydata.org/en/latest/>
to generate
pretty SVG charts that work offline, and
google-api-python-client <https://github.com/google/google-api-python-client/>
for querying BigQuery. Each of those have additional dependencies.
It's recommended that you install into a virtual environment (virtualenv /
venv). See the virtualenv usage documentation <http://www.virtualenv.org/en/latest/>
_
for information on how to create a venv.
This isn't on pypi yet, ironically. Until it is:
.. code-block:: bash
$ pip install pypi-download-stats
You'll need Google Cloud credentials for a project that has the BigQuery API
enabled. The recommended method is to generate system account credentials;
download the JSON file for the credentials and export the path to it as the
GOOGLE_APPLICATION_CREDENTIALS
environment variable. The system account
will need to be added as a Project Member.
Run with -h
for command-line help::
usage: pypi-download-stats [-h] [-V] [-v] [-Q | -G] [-o OUT_DIR]
[-p PROJECT_ID] [-c CACHE_DIR] [-B BACKFILL_DAYS]
[-P PROJECT | -U USER]
pypi-download-stats - Calculate detailed download stats and generate HTML and
badges for PyPI packages - <https://github.com/jantman/pypi-download-stats>
optional arguments:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-v, --verbose verbose output. specify twice for debug-level output.
-Q, --no-query do not query; just generate output from cached data
-G, --no-generate do not generate output; just query data and cache
results
-o OUT_DIR, --out-dir OUT_DIR
output directory (default: ./pypi-stats
-p PROJECT_ID, --project-id PROJECT_ID
ProjectID for your Google Cloud user, if not using
service account credentials JSON file
-c CACHE_DIR, --cache-dir CACHE_DIR
stats cache directory (default: ./pypi-stats-cache)
-B BACKFILL_DAYS, --backfill-num-days BACKFILL_DAYS
number of days of historical data to backfill, if
missing (defaut: 7). Note this may incur BigQuery
charges. Set to -1 to backfill all available history.
-P PROJECT, --project PROJECT
project name to query/generate stats for (can be
specified more than once; this will reduce query cost
for multiple projects)
-U USER, --user USER Run for all PyPI projects owned by the specifieduser.
To run queries and generate reports for PyPI projects "foo" and "bar", using a
Google Cloud credentials JSON file at foo.json
:
.. code-block:: bash
$ export GOOGLE_APPLICATION_CREDENTIALS=/foo.json
$ pypi-download-stats -P foo -P bar
To run queries but not generate reports for all PyPI projects owned by user "myname":
.. code-block:: bash
$ export GOOGLE_APPLICATION_CREDENTIALS=/foo.json
$ pypi-download-stats -G -U myname
To generate reports against cached query data for the project "foo":
.. code-block:: bash
$ export GOOGLE_APPLICATION_CREDENTIALS=/foo.json
$ pypi-download-stats -Q -P foo
To run nightly and upload results to a website-hosting S3 bucket, I use the
following script via cron (note the paths are specific to my purpose; also note
the two commands, as s3cmd
does not seem to set the MIME type for the SVG
images correctly):
.. code-block:: bash
#!/bin/bash -x
export GOOGLE_APPLICATION_CREDENTIALS=/home/jantman/.ssh/pypi-bigquery.json
cd /home/jantman/GIT/pypi-download-stats
bin/pypi-download-stats -vv -U jantman
# sync html files
~/venvs/foo/bin/s3cmd -r --delete-removed --stats --exclude='*.svg' sync pypi-stats s3://jantman-personal-public/
# sync SVG and set mime-type, since s3cmd gets it wrong
~/venvs/foo/bin/s3cmd -r --delete-removed --stats --exclude='*.html' --mime-type='image/svg+xml' sync pypi-stats s3://jantman-personal-public/
Cost ++++
At this point... I have no idea. Some of the download tables are 3+ GB per day. I imagine that backfilling historical data from the beginning of what's currently there (20160122) might incur quite a bit of data cost.
Bug reports and feature requests are happily accepted via the GitHub Issue Tracker <https://github.com/jantman/pypi-download-stats/issues>
_. Pull requests are
welcome. Issues that don't have an accompanying pull request will be worked on
as my time and priority allows.
To install for development:
pypi-download-stats <https://github.com/jantman/pypi-download-stats>
_ repository on GitHub.. code-block:: bash
$ virtualenv pypi-download-stats
$ cd pypi-download-stats && source bin/activate
$ pip install -e git+git@github.com:YOURNAME/pypi-download-stats.git@BRANCHNAME#egg=pypi-download-stats
$ cd src/pypi-download-stats
The git clone you're now in will probably be checked out to a specific commit,
so you may want to git checkout BRANCHNAME
.
There isn't any right now. I'm bad. If people actually start using this, I'll refactor and add tests, but for now this started as a one-night project.
Open an issue for the release; cut a branch off master for that issue.
Confirm that there are CHANGES.rst entries for all major changes.
Ensure that Travis tests passing in all environments.
Ensure that test coverage is no less than the last release (ideally, 100%).
Increment the version number in pypi-download-stats/version.py and add version and release date to CHANGES.rst, then push to GitHub.
Confirm that README.rst renders correctly on GitHub.
Upload package to testpypi:
test
for https://testpypi.python.org/pypi)rm -Rf dist
python setup.py register -r https://testpypi.python.org/pypi
python setup.py sdist bdist_wheel
twine upload -r test dist/*
Create a pull request for the release to be merged into master. Upon successful Travis build, merge it.
Tag the release in Git, push tag to GitHub:
git tag -a X.Y.Z -m 'X.Y.Z released YYYY-MM-DD'
git push origin X.Y.Z
Upload package to live pypi:
twine upload dist/*
make sure any GH issues fixed in the release were closed.