sacundim / covid-19-puerto-rico

COVID-19 data and graphs for Puerto Rico
13 stars 6 forks source link

Big rewrite downloader code to make it much faster #41

Closed sacundim closed 2 years ago

sacundim commented 2 years ago

Rewrite the downloader scripts to:

  1. Stop using jq for JSON array to JSON Lines conversion. It was very slow and memory intensive. Now we use a very simple Rust binary that I wrote.
  2. Stop using the Pyarrow-based json2parquet and csv2parquet converters, in favor of new Rust-based ones.
  3. Since these do a better job of type inference, it has necessitated handling and cleansing a new version of the source input files.