nannau / nc2pt

Serializing NetCDF files for efficient use in deep learning pipelines.
GNU General Public License v3.0
2 stars 0 forks source link

preprocess memory and data loading #13

Open nadiyashore opened 5 months ago

nadiyashore commented 5 months ago

Running preprocess.py would cause it to shutdown halfway through each time at the exact same point, after this output:

[2024-05-23 16:29:50,840][root][INFO] - Normalizing tas...
[2024-05-23 16:29:50,840][root][INFO] - Computing min and max...
[2024-05-23 16:29:50,840][root][INFO] - Calculation min...

giving errors related to timeout and workers.

To solve this issue, this line in nc2pt/io.py (line 20): with xr.open_mfdataset(path, engine=engine, parallel=True, chunks="auto") as ds: what changed to: with xr.open_mfdataset(path, engine=engine, parallel=True, chunks=275) as ds:

Why?