By default, we generate the data catalog in feather format, which we benchmarked to be the fastest and most compact columnar format, slightly better than parquet. However parquet has emerged to be much more widely supported in the community, meaning that we would like to be generating it by default.
Task
Turn on saving to parquet by default in prod
(Add parquet to the DEFAULT_FORMATS in lib/catalog/owid/catalog/datasets.py)
(Ideally) Avoid generating both feather and parquet in local dev, unless you ask for it
Background
By default, we generate the data catalog in
feather
format, which we benchmarked to be the fastest and most compact columnar format, slightly better thanparquet
. Howeverparquet
has emerged to be much more widely supported in the community, meaning that we would like to be generating it by default.Task
parquet
to theDEFAULT_FORMATS
inlib/catalog/owid/catalog/datasets.py
)