mrschweizer / PyThat

This is a community package which helps reading .h5 files created by ThatecOS and converting them to xarray objects and netcdf files. This software is not maintained by and has no affiliations to THATec Innovation GmbH.
MIT License
4 stars 1 forks source link

Missing data #8

Closed timvgl closed 1 year ago

timvgl commented 2 years ago

Hi, when I am trying to load a 13GB h5 file into python using PyThat one of two cases can happen: Program crashes because it consumes too much memory - so it would be helpful if there is an option to load data chunkwise. *.nc file is incomplete. Only about 1/3 of the data is available. The rest is just missing. So the .nc file has to be generated new each time, which costs a lot of time becuse the first case can happen.

mrschweizer commented 2 years ago

So, apparently this error occurs only when saving the netcdf file. At the moment I'm hoping, that xarray.DataArray.chunk/xarray.Dataset.chunk will do the trick. However, I'm a bit afraid of possible bugs, which is why I will try to estimate beforehand, if the available memory will be sufficient for the task at hand. I'm working on it, but it may take some time.

mrschweizer commented 1 year ago

I will include an option to the construct_measurement_tree function. At the moment I imagine just using https://docs.xarray.dev/en/stable/generated/xarray.Dataset.chunk.html before saving it to the netcdf file. I could imagine that this would work.

mrschweizer commented 1 year ago

See aef969cf33ba3686afe05917c549762a3f7f111c

mrschweizer commented 1 year ago

@timvgl Did you have the change to check if it works?

timvgl commented 1 year ago

It seems to work. Thank you!