ruby / zlib

Ruby interface for the zlib compression/decompression library
Other
50 stars 35 forks source link

`Zlib::GzipReader` doesn't read some large files #50

Open inkstak opened 1 year ago

inkstak commented 1 year ago

Hi. I have to inflate a .csv.gz file which should return a 4 GB CSV with 25 million rows.

When I use an app or the gzip command line, I get the full file without issue. When I use Zlib::GzipReader, only the first row is returned.

> Zlib::GzipReader.open("adresses-france.csv.gz") { |gz|  print gz.read }
id;id_fantoir;numero;rep;nom_voie;code_postal;code_insee;nom_commune;code_insee_ancienne_commune;nom_ancienne_commune;x;y;lon;lat;type_position;alias;nom_ld;libelle_acheminement;nom_afnor;source_position;source_nom_voie;certification_commune;cad_parcelles
 => nil

The file is provided by the french government:

There are many other files in the directory (for each region) but I cannot reproduce the issue with other files.

This service also provided a similar file in Addok format (https://adresse.data.gouv.fr/data/ban/adresses/latest/addok/adresses-addok-france.ndjson.gz) which should return a 3GB file with 2 million rows, but only the 25k first rows are returned by Zlib::GzipReader.

Is there any limit to what Zlib can support ? (size, rows, ..) Does it come from the compressed file ?