Closed slevang closed 2 months ago
get close to a 4x speedup (and better than xesmf):
Awesome! I did not bother with this originally for the reasons you mentioned. But it's great to see that it's a (relatively easy) way to gain a lot of performance.
Edit: I don't see any significant performance gain compared to the benchmark I ran from #42...
That was a VM with pretty old CPU architecture and I guess just really slow on these particular calculations. I'm also seeing much faster results on my 8 core M1 Mac, about 12s for the skipna=False
on current main
. With the sparse weights it drops to 5s though. I'm seeing even bigger improvement with skipna=True
I think because the sparsity limits the size of the weight array as we track NaNs over each dim.
Mixed results switching between threaded and distributed schedulers, sometimes a bit faster sometimes slower.
I ran the benchmarking test in #42 across several configurations on my 8 core i7 linux desktop. Runtimes to the nearest second:
chunks={"time": 1} , ~4MB |
skipna=False | threads | distributed |
---|---|---|---|
sparse | 8 | 14 | |
dense | 28 | 17 | |
xesmf | 30 | 37 |
skipna=True | threads | distributed |
---|---|---|
sparse | 67 | 82 |
dense | 327 | 335 |
xesmf | 55 | 71 |
chunks={"time": 10} , ~40MB |
skipna=False | threads | distributed |
---|---|---|---|
sparse | 6 | 7 | |
dense | 13 | 12 | |
xesmf | 7 | 6 |
skipna=True | threads | distributed |
---|---|---|
sparse | 59 | 72 |
dense | OOM | OOM |
xesmf | 10 | 12 |
Lots of interesting variation. My takeaways:
Potential improvement for #42.
The focus on rectilinear grids for this package, and factorization of regridding along dimensions, makes generating and using dense weights feasible. However, the level of sparsity in the weights matrix is still extremely high for any reasonable size grid. I did some experiments converting the weights to a sparse matrix after creation, and am seeing nice improvements both in compute time and memory footprint.
On the example in https://github.com/xarray-contrib/xarray-regrid/issues/42#issuecomment-2363771715 I get close to a 4x speedup (and better than xesmf):