hugovk / top-pypi-packages

A regular dump of the most-downloaded packages from PyPI
https://hugovk.github.io/top-pypi-packages
223 stars 13 forks source link

Query for the data #8

Closed ChaiBapchya closed 4 years ago

ChaiBapchya commented 4 years ago

What's the query being used for retrieving the data about the PyPi packages?

hugovk commented 4 years ago

Data is queried using the https://github.com/ofek/pypinfo/ tool, with these commands:

https://github.com/hugovk/top-pypi-packages/blob/44e2c77aee59e630df77e28a5144c261bf31f7c9/generate.sh#L6-L8

Running the same commands with the --test option to prints the queries:

$ pypinfo --test --json --indent 0 --limit 4000 --days  30 "" project
SELECT
  file.project as project,
  COUNT(*) as download_count,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -31, "day"),
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
  )
WHERE
  details.installer.name = "pip"
GROUP BY
  project,
ORDER BY
  download_count DESC
LIMIT 4000
$ pypinfo --test --json --indent 0 --limit 4000 --days  365 "" project
SELECT
  file.project as project,
  COUNT(*) as download_count,
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -366, "day"),
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "day")
  )
WHERE
  details.installer.name = "pip"
GROUP BY
  project,
ORDER BY
  download_count DESC
LIMIT 4000
ChaiBapchya commented 4 years ago

Thanks!