Closed Marigold closed 2 weeks ago
Quick links (staging server): Site | Admin | Wizard | Docs |
---|
Login: ssh owid@staging-site-generate-parquet
Edited: 2024-11-12 05:12:27 UTC Execution time: 12.96 seconds
This is very nice, @Marigold, thanks! Now I can finally just
duckdb
> from 'data/garden/energy/2024-06-20/primary_energy_consumption/primary_energy_consumption.parquet' limit 10
I don't think we need this on the staging servers for now and we can switch once we are asked for it or a need arises?
If we switch this on in production, will this cause any issues with anything in the existing catalog that might rely on feather files?
If we switch this on in production, will this cause any issues with anything in the existing catalog that might rely on feather files?
That's very unlikely. We used to publish both for a long time and never ran into any issues.
I'm not going to rebuild ETL catalog yet, but will wait for nullable types that should be ready soon.
Implements https://github.com/owid/etl/issues/3490
Allow loading
DEFAULT_FORMATS
from env variable. This will let us generate both feather and parquet in production and push it into our data catalog in R2. Local development would still use onlyfeather
as default. @danyx23 do you see any value in generating parquet on staging servers?We used to add all our metadata directly into parquet, but that made it inefficient and no one was using it, so we removed it. Metadata is still available in a sidecar
[table].meta.json
in the same folder as[table].parquet
.TODO after merging
DEFAULT_FORMATS=feather,parquet
env to productionETL_EPOCH
or pandas version)