juseg / hyoga

Paleoglacier modelling framework
https://hyoga.io
GNU General Public License v3.0
8 stars 0 forks source link

Opening cached data drastically sped up with dask. #74

Closed juseg closed 1 month ago

juseg commented 1 year ago

Dask can be used to read only necessary chunks on global data leading to a huge performance boost. I think this should be at least an option and probably the default behaviour if dask is installed.

Preparing a Cocuy-1km atmosphere file with CHELSA data I get a performance boost from 2m16s to 4s. The optimal chunk on my machine is a {'y': 120'} horizontal stripe. I wonder if performance can be improved even more by storing global data in a tiled format instead of the original striped.

juseg commented 1 month ago

I can only reproduce v0.3.0 slowness by providing an explicit chunks=-1 to open_mfdataset. I tested this on a few different versions of xarray starting with v2022.06.0 and simply omitting the chunks argument consistently performs well. The dask dependency is missing and that will be fixed with #76. However I can no longer reproduce this issue.