pypi-data / data

Public datasets with per-file infromation about packages uploaded to PyPI.
MIT License
6 stars 0 forks source link

Parquet files not updated since 2024-07-06? #75

Closed venthur closed 1 month ago

venthur commented 1 month ago

As of today (2024-08-14), the contents of https://raw.githubusercontent.com/pypi-data/data/main/links/dataset.txt is:

https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-0.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-1.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-10.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-11.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-12.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-13.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-14.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-15.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-16.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-17.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-18.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-19.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-2.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-3.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-4.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-5.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-6.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-7.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-8.parquet
https://github.com/pypi-data/data/releases/download/2024-07-06-03-05/index-9.parquet

Is something broken in the pipeline that causes the parquet files not be be properly updated anymore?

orf commented 1 month ago

Hey! Sorry about this - the parquet files being generated where over the 2gb github limit.

I've fixed this: https://github.com/pypi-data/data/releases/tag/2024-08-30-22-00

Thanks for the report!

venthur commented 4 weeks ago

I have a related question to the parquet files. Assuming I have downloaded them some months ago, and want to update to the newest data, do I have to download all of them again, or is index-15.parquet always the same and you just keep adding new ones?