Closed dipstef closed 2 years ago
I believe for wikipedia dumps you need to use MultiGzDecoder
Cheers, that did it!
My follow up question seem to be already be addressed in this issue:
https://github.com/rust-lang/flate2-rs/issues/178
So I would rely on usages of MultiGzDecoder instead for arbitrary files.
Cheers,
Hi all!
Apologies if I did miss something out here and the error is on my behalf (however this is not different than any standard usage of this library), I am reading a wikidata dump line by line, and being this a giant json array only the first line containing the opening square bracket is being returned:
The dump is the following: https://dumps.wikimedia.org/wikidatawiki/entities/20220606/wikidata-20220606-all.json.gz
Switching to the loop based format, the second call to read_lines returns 0 bytes, which should be in line with the lines iterator behaviour.
No issues when reading the above file from gzcat or a python script.
Any idea on how to troubleshoot this?
Thanks in advance for your help!