Open inkstak opened 1 year ago
Hi. I have to inflate a .csv.gz file which should return a 4 GB CSV with 25 million rows.
.csv.gz
When I use an app or the gzip command line, I get the full file without issue. When I use Zlib::GzipReader, only the first row is returned.
gzip
Zlib::GzipReader
> Zlib::GzipReader.open("adresses-france.csv.gz") { |gz| print gz.read } id;id_fantoir;numero;rep;nom_voie;code_postal;code_insee;nom_commune;code_insee_ancienne_commune;nom_ancienne_commune;x;y;lon;lat;type_position;alias;nom_ld;libelle_acheminement;nom_afnor;source_position;source_nom_voie;certification_commune;cad_parcelles => nil
The file is provided by the french government:
There are many other files in the directory (for each region) but I cannot reproduce the issue with other files.
This service also provided a similar file in Addok format (https://adresse.data.gouv.fr/data/ban/adresses/latest/addok/adresses-addok-france.ndjson.gz) which should return a 3GB file with 2 million rows, but only the 25k first rows are returned by Zlib::GzipReader.
Is there any limit to what Zlib can support ? (size, rows, ..) Does it come from the compressed file ?
Hi. I have to inflate a
.csv.gz
file which should return a 4 GB CSV with 25 million rows.When I use an app or the
gzip
command line, I get the full file without issue. When I useZlib::GzipReader
, only the first row is returned.The file is provided by the french government:
There are many other files in the directory (for each region) but I cannot reproduce the issue with other files.
This service also provided a similar file in Addok format (https://adresse.data.gouv.fr/data/ban/adresses/latest/addok/adresses-addok-france.ndjson.gz) which should return a 3GB file with 2 million rows, but only the 25k first rows are returned by
Zlib::GzipReader
.Is there any limit to what Zlib can support ? (size, rows, ..) Does it come from the compressed file ?