issues
search
sacundim
/
covid-19-puerto-rico
COVID-19 data and graphs for Puerto Rico
13
stars
6
forks
source link
Big rewrite downloader code to make it much faster
#41
Closed
sacundim
closed
2 years ago
sacundim
commented
2 years ago
Rewrite the downloader scripts to:
Stop using
jq
for JSON array to JSON Lines conversion. It was very slow and memory intensive. Now we use
a very simple Rust binary that I wrote
.
Stop using the Pyarrow-based
json2parquet
and
csv2parquet
converters, in favor of new Rust-based ones.
Since these do a better job of type inference, it has necessitated handling and cleansing a new version of the source input files.
Rewrite the downloader scripts to:
jq
for JSON array to JSON Lines conversion. It was very slow and memory intensive. Now we use a very simple Rust binary that I wrote.json2parquet
andcsv2parquet
converters, in favor of new Rust-based ones.