Open qq23840 opened 3 months ago
Quick fix of dropping the extra variables also throws an error when using the get_footprint
function; without start_date
and end_date
arguments the function looks like it works fine, but passing these gives a KeyError
for the Timestamp which I can't decipher
Update - @gareth-j made a branch (benFix1) in which the time slicing doesn't happen in the openghg framework. Works OK for using the get_footprint
functionality, but needs some tweaks to the openghg_inversions code to time slice the data appropriately when running an inversion (openghg_inversions.get_data.py
, currently on local branch).
I'm not sure there's an easy way to do this since the difference is in the data variables instead of the metadata, but my first thought is that these should probably be stored as two different datasources, or at least in two separate zarr stores. In the old system, every year of data was stored independently, but with zarr, we're assuming that the data variables and coordinates known. It's possible that the new variables aren't being chunked properly, since the chunk sizes are based on the data already in the zarr store.
Are these new variables, or just different variable names? (I thought NAME footprints typically have mean particle age variables. If the species is inert, you don't need them though.)
They are new variables, I think - unless I'm looking at an old set of footprints, which is possible. I've taken them all from the shared area on bp1 but I'm not 100% sure of their status. In this set, pre-2020 doesn't have the variables at all, whereas they do exist for pre-2020. It's true that for inert species they're not needed, though, and that's the route I've gone down in doing a rough-and-ready fix.
Check the current Dataset in the zarr store for the existing variables. If they don't exist then fill in with NaNs.
What happened?
Trying to standardise some GSN footprints for 2008-2022 into a zarr object stores. The raw netCDF files have slightly different set of variables for 2020-2022 (they contain
mean_age_particles_n
-type variables, which the pre-2020 footprints I have don't). If trying to standardise these two types of footprint into the same _zarr object store, I get an xarray ValueError about dimension sizes when trying to store the two footprints in the object storeWhat did you expect to happen?
In the old object stores, I could standardise these files alongside each other, but it's not working in the current setup. Can get around it by first dropping the offending variables, but obviously this is just a work around
Minimal Complete Verifiable Example
No response
Relevant log output
No response
Anything else we need to know?
No response
Environment
Python 3.11.7 OpenGHG v0.8.0 (devel branch)