xarray-contrib / xarray-regrid

Regridding utility for xarray
https://xarray-regrid.readthedocs.org/
Apache License 2.0
71 stars 7 forks source link

Sparse weights in conservative method #49

Closed slevang closed 2 months ago

slevang commented 2 months ago

Potential improvement for #42.

The focus on rectilinear grids for this package, and factorization of regridding along dimensions, makes generating and using dense weights feasible. However, the level of sparsity in the weights matrix is still extremely high for any reasonable size grid. I did some experiments converting the weights to a sparse matrix after creation, and am seeing nice improvements both in compute time and memory footprint.

On the example in https://github.com/xarray-contrib/xarray-regrid/issues/42#issuecomment-2363771715 I get close to a 4x speedup (and better than xesmf):

CPU times: user 42.5 s, sys: 6.01 s, total: 48.5 s
Wall time: 11.6 s
CPU times: user 6min 9s, sys: 41.6 s, total: 6min 51s
Wall time: 59.2 s
BSchilperoort commented 2 months ago

get close to a 4x speedup (and better than xesmf):

Awesome! I did not bother with this originally for the reasons you mentioned. But it's great to see that it's a (relatively easy) way to gain a lot of performance.

Edit: I don't see any significant performance gain compared to the benchmark I ran from #42...

slevang commented 2 months ago

That was a VM with pretty old CPU architecture and I guess just really slow on these particular calculations. I'm also seeing much faster results on my 8 core M1 Mac, about 12s for the skipna=False on current main. With the sparse weights it drops to 5s though. I'm seeing even bigger improvement with skipna=True I think because the sparsity limits the size of the weight array as we track NaNs over each dim.

Mixed results switching between threaded and distributed schedulers, sometimes a bit faster sometimes slower.

slevang commented 2 months ago

I ran the benchmarking test in #42 across several configurations on my 8 core i7 linux desktop. Runtimes to the nearest second:

chunks={"time": 1}, ~4MB skipna=False threads distributed
sparse 8 14
dense 28 17
xesmf 30 37
skipna=True threads distributed
sparse 67 82
dense 327 335
xesmf 55 71
chunks={"time": 10}, ~40MB skipna=False threads distributed
sparse 6 7
dense 13 12
xesmf 7 6
skipna=True threads distributed
sparse 59 72
dense OOM OOM
xesmf 10 12

Lots of interesting variation. My takeaways: