Closed slevang closed 2 months ago
This probably is quite unintuitive to users, was the reason for removing this tracking? mostly just simplifying the code? If it does allow for a much lower memory footprint (without users needing to improve their chunking) it could be interesting to leave the non-grid-dims tracking in.
It does simplify the code. But primarily, getting rid of this aggregation of the valid frac over non grid dims allows us to truly track isolated NaN fractions, so that you can have NaN in just a single time slice for example and get the correct output. I think what I have now is the most sensible baseline option, but I have some other ideas for improving conservative performance so I'll follow up in a different branch to avoid adding too much to this PR.
I made a few small changes/refactors which I guess cancelled out your previous review.
Spherical padding
This makes an initial attempt at building in some automatic logic for better handling the boundaries of a spherical domain with appropriate padding, as well as shifting the longitude source grid to match the target as needed.
Faster tests
Made some modifications and general cleanup of the existing tests. Mainly making the datasets dask-backed so the regridding functions are lazy for all the attribute checks. Also got rid of some unneeded copies, compute calls, etc. Tests run in about 17 seconds now.
With the addition of the spherical padding logic I was also able to remove some aspects of the tests where we avoided checking for good matches at the boundaries.
Conservative NaN tracking
I did some benchmarking and the version of NaN tracking where we track independently across non-grid dimensions as well doesn't actually have much of a run-time penalty. Therefore I went ahead and made this small change. The only issue is that it can induce a very large memory penalty if the input data isn't chunked, because the weights arrays can become huge if they are broadcast with all the other input dimensions. I added a note to the docstring about this but perhaps we want a more formal warning, or to explicitly add some chunking in certain cases?
TODO / open questions