ocean-eddy-cpt / gcm-filters

Diffusion-based Spatial Filtering of Gridded Data
https://gcm-filters.readthedocs.io/
Other
37 stars 24 forks source link

[BUG] Inconsistent dimensionality of discrete Laplacians #136

Closed iangrooms closed 2 years ago

iangrooms commented 2 years ago

Prompted by #106, I started to add some information about units to the code comments and documentation. In doing so I realized that there is a potential problem for users using either REGULAR or REGULAR_WITH_LAND grid types. Suppose a user has a regular grid with spacing 5 km, and wants to filter to a scale of 40 km. It would be completely reasonable for them to input dx_min = 5000 and filter_scale = 40000, but that would produce wrong results.

The problem is a dimensional inconsistency in the filter steps. A Laplacian filter step has the form field_bar += (1 / s_l) * Laplacian(field_bar). In order for this to be dimensionally consistent, the dimensions of the Laplacian operator have to be the same as the dimensions of s_l. (Similar for biharmonic steps and s_b.) The way the code works right now, the dimensions of s_l are the same as the dimensions of dx_min ** (-2), but the REGULAR Laplacians (including the AREA_WEIGHTED ones) are nondimensional because they implicitly use a grid size of 1.

I can think of a few ways to fix this to avoid potential future problems.

  1. Force the user to set dx_min = 1 for all the REGULAR grid types. The AREA_WEIGHTED grid types already enforce dimensional consistency by throwing an error if dx_min is not 1. While it would work, it seems inelegant, and I'm not sure how to implement it anyways.
  2. Make all the REGULAR Laplacians dimensional by simply dividing the result produced by the existing code by dx_min ** 2. I like this because the variable names dx_min and filter_scale mean exactly what they say, and you can use any units you want (cm, m, km, whatever) as long as they're consistent. With this approach you could even set dx_min = 1 and then let filter_scale be the nondimensional filter factor and it would still work. I'm not sure how to effectively implement this though.
  3. Make everything nondimensional. To accomplish this we would set dx_min=1 (no longer a user-defined input), re-name filter_scale to filter_factor, and then nondimensionalize all of the IRREGULAR grids by requiring their grid_vars to be nondimensional using the dimensional length of the smallest grid cell. I don't particularly like this. Seems like it requires extra and unnecessary work from the user.

Mainly I'd like some feedback on how to implement proposed change number 2, but I'd also welcome other ideas or general feedback about this issue.

NoraLoose commented 2 years ago

One option to implement your 2nd idea is the following:

NoraLoose commented 2 years ago

Question: Do we want to deal with this issue before or after publication of the JOSS paper? The reviewers have given their thumbs-ups, and JOSS is ready whenever we are. Everything we do before publication will make it into the newest package release (associated with the JOSS paper).

@iangrooms, were you planning to put in a PR? I won't have much time this week. If we are missing the human resources to work on this, we could also delay this issue until the next release.

iangrooms commented 2 years ago

It's going to take me a bit to figure out the implementation, so maybe just go ahead with the JOSS release. All I'm trying to fix is a way that someone could get the wrong answer, but if someone uses the current code correctly they should get the right answers.

rabernat commented 2 years ago

IMO we definitely don't want to pause the JOSS process. Software is never finished. The more we use this software, the more bugs we will find. We will be maintaining gcm filters forever. The JOSS review is to establish that we have a process in place to handle these bugs as they are discovered. We clearly do. 🚀

iangrooms commented 2 years ago

The nondimensional Laplacians don't have grid sizes as a required grid_var, because they assume the grid size is 1. We could check if the Laplacian has any grid_vars besides area and wet_mask. If it does not, then it must be nondimensional and we can insert the line tendency /= dx_min ** 2. Or in the same vein, if any of the required grid vars starts with d (as in dxw, dyw) then it must be a dimensional Laplacian so we don't have to scale tendency. If possible, it seems like it would be simpler than overhauling kernels.py. Thoughts?

NoraLoose commented 2 years ago

Seems risky to me. It would work for the existing set of Laplacians. But if more are added, and we (or other users) forget about this rule based on naming conventions, we could run into problems.

Being more intentional about it by doing something to the classes in kernels.py will force contributors to think about whether their new Laplacian is non-dimensional or dimensional.