Open qq23840 opened 2 months ago
Each "chunk" in a zarr store is saved as a separate file. If you use xr.open_zarr("*_merged-data.pickle.zarr")
you can see how many chunks there are (without loading the data into memory). I guess the xarray defaults might be creating a lot of chunks. I can change that.
I can make it possible to pass the output type of file used, so it could be saved as netcdf.
@gareth-j any suggestions? is this something the nested directory store helps with?
Is this cache something that needs to be updated or is it a one time thing? If it's just cached and then read you could maybe use a ZipStore instead.
On Wed, 1 May 2024, 17:18 Brendan Murphy, @.***> wrote:
Each "chunk" in a zarr store is saved as a separate file. If you use xr.open_zarr("*_merged-data.pickle.zarr") you can see how many chunks there are (without loading the data into memory). I guess the xarray defaults might be creating a lot of chunks. I can change that.
I can make it possible to pass the output type of file used, so it could be saved as netcdf.
@gareth-j https://github.com/gareth-j any suggestions? is this something the nested directory store helps with?
— Reply to this email directly, view it on GitHub https://github.com/openghg/openghg_inversions/issues/108#issuecomment-2088531311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACEAR3QYQB7SXWA5CBDOFKTZAD2TVAVCNFSM6AAAAABHAJKUFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBYGUZTCMZRGE . You are receiving this because you were mentioned.Message ID: @.***>
@qq23840 for now, maybe set "save_merged_data" to false, and if you want to hold onto the old data, make it into a zip file.
@gareth-j 's suggestion should fix this problem, but I might not have time to add this fix before next Thursday (I'm away until then, although I'm trying to fix some other inversions problems at the moment...)
Perhaps a silly issue exposing my lack of understanding of zarr, but when running an inversion with
save_merged_data = True
, the resulting*_merged-data.pickle.zarr
directory contains a huge number of files (upwards of 25k, sometimes). It's not big in terms of space, but it's meant I fairly quickly run into my filenumber limit on bp1.I don't actually need to save the merged data, so will just set to False for now, but more generally is there a way of making these output folders contain less files? I know #92 allowed for more options in terms of the merged data, but it seems to default to this zarr format.