paperswithcode / paperswithcode-data

The full dataset behind paperswithcode.com
306 stars 32 forks source link

Invalid gzips linked from README #17

Closed jmelot closed 2 months ago

jmelot commented 2 years ago

Hi, thanks for this project and sharing your data! We pull updates regularly. Our pipeline had previously been running fine, but on September 30 we started getting OSError: Not a gzipped file from the python gzip library when trying to read any of the .json.gz files linked in the README of this repository. I just manually inspected the methods.json.gz linked in the README at https://production-media.paperswithcode.com/about/methods.json.gz , and when I run gunzip on it I get gunzip: methods.json.gz: not in gzip format. This file appears to be uncompressed. We can read the file as json, so not a big issue for us, but just letting you know.

dhamaris commented 2 years ago

Hi, we had a similar problem when trying to download from https://paperswithcode.com/media/about/evaluation-tables.json.gz using java, we got this error: java.lang.Exception: java.util.zip.ZipException: Not in GZIP format

In our case it doesn't happen all the time, we pull data every day since September and we got this error twice during November and today in the morning. If I run the same job by hand it works ok.

jmelot commented 2 years ago

@dhamaris fwiw I noticed in this commit they updated the urls in the readme. Since I switched my code over to use the new urls, I haven't had further issues.

dhamaris commented 2 years ago

Thank you for the info @jmelot