ocean-eddy-cpt / gcm-filters

Diffusion-based Spatial Filtering of Gridded Data
https://gcm-filters.readthedocs.io/
Other
37 stars 24 forks source link

Fix dimensionality #146

Closed iangrooms closed 2 years ago

iangrooms commented 2 years ago

This PR addresses the problem of dimensionality described in #136. A quick summary: If a user tries any of the REGULAR Laplacians and uses a value of dx_min other than 1, they will get wrong answers. This PR will allow users to put in dimensional values of dx_min and filter_scale (in any system of units) and still get the right answers.

I am attempting to be minimally invasive. I've added a boolean class attribute is_nondimensional to all of the Laplacians. If the Laplacian is nondimensional, then at each step of the filter (in filter.py) it re-dimensionalizes by dividing by dx_min ** 2. The filter_func's don't have access to dx_min so I added dx_min_sq = dx_min ** 2 to the filter spec.

I initially started with @NoraLoose's suggestion (from #136) of defining new Laplacian classes, but as I worked on it it seemed like it would require bigger changes. For example, _create_filter_func accepts a BaseScalarLaplacian and I didn't want to have to define four new versions of _create_filter_func (scalar vs vector; dimensional vs non).

I also added a Note in the Basic Filtering section of the docs, saying that any grid_vars should have units that are consistent with filter_scale and dx_min, but that the specific system of units doesn't matter.

NoraLoose commented 2 years ago

Great work, Ian!

If we stick to the is_nondimensional attribute, we should add some tests to this PR. In an ideal scenario, we want to test every newly added line to the code. Based on this, I can think of 2 tests:

  1. A test that checks that the is_nondimensional attribute is there for each Laplacian, and has the expected value (e.g., to make sure a future PR does not accidentally change is_nondimenional from True to False for a Laplacian). Such a test could be added to test_kernels.py and could follow this structure:
    
    @pytest.mark.parametrize(Laplacian, expected)
    ...

def test_nondimensional(Laplacian, expected):

laplacian = Laplacian()
assert laplacian.is_nondimensional == expected

2. A test for the added lines in `filter.py`. For example, we could test if the filter gives the same thing if we put in values for `dx_min`, `area` etc. in cm vs. km.

Let me know if you would like help with these tests. But obviously, before implementing these test cases in `test_*`, we should agree on how we want to implement the code changes in `kernels.py` and `filter.py`.
iangrooms commented 2 years ago

Replying to @rabernat:

Would it be possible to simply define a nondimensional Laplacian as a Laplacian for which dx_min==1?

The nondimensional Laplacians don't just have minimum grid spacing 1; they have every cell being a unit square. Identifying regular Laplacians with dx_min==1 could break if, for example, there was a variable grid whose minimum was coincidentally 1 km (or 1 m or 1 cm, etc). The user would set dx_min=1 and then the code would mistakenly think it was using a regular/nondimensional Laplacian.

perhaps we want to attach dx_min to the Laplacian / grid, rather than to the Filter itself

The Filter does need to know about dx_min. The Filter object obtains a polynomial approximation that is accurate over a range of wavenumbers, with the highest wavenumber in the range set by dx_min.

perhaps we want to attach dx_min to the Laplacian / grid, rather than to the Filter itself

As noted above, the filter needs dx_min. But maybe your suggestion points to a different approach. My proposed fix has the same effect as changing the grid size in the regular Laplacians from 1 to dx_min, but without having to change the actual Laplacian kernel code. Instead of having the Laplacian kernel know about the grid size (which would be natural), I hacked _create_filter_func to fix the output of the nondimensional Laplacians. Not sure this is the best way, but it does seem to work.

An more-natural alternative would be to add dx_min (or maybe just dx) to the grid_vars for the REGULAR Laplacians, and then change the code of the Laplacian kernels so that they divide by dx ** 2 right before returning the result. I didn't choose this path because it would break all of the code that currently uses the REGULAR Laplacians, in addition to requiring a fairly major overhaul of kernels.py. If there's a strong argument for doing it this way even if it breaks backwards compatibility, then I'm happy to code it up. Curious to hear what you think.

iangrooms commented 2 years ago

I added some tests as suggested by @NoraLoose. In test_kernels.py I add a test that just checks if the is_nondimensional attribute has the correct value. In test_filters.py I run the REGULAR filter on test data with dx_min=1, filter_scale=4 and again with dx_min=2 and filter_scale=8, and make sure the results are the same.

codecov-commenter commented 2 years ago

Codecov Report

Merging #146 (85f38a8) into master (766f624) will decrease coverage by 0.50%. The diff coverage is 88.00%.

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   98.49%   97.98%   -0.51%     
==========================================
  Files           9        9              
  Lines         997     1044      +47     
==========================================
+ Hits          982     1023      +41     
- Misses         15       21       +6     
Flag Coverage Δ
unittests 97.98% <88.00%> (-0.51%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
gcm_filters/filter.py 96.46% <70.00%> (-2.58%) :arrow_down:
gcm_filters/kernels.py 98.61% <100.00%> (+0.04%) :arrow_up:
tests/test_filter.py 99.56% <100.00%> (+0.01%) :arrow_up:
tests/test_kernels.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 766f624...85f38a8. Read the comment docs.

iangrooms commented 2 years ago

The reason code coverage decreased is that I implemented the fix for non-dimensional vector Laplacians, but we don't have any non-dimensional vector Laplacians, so those lines don't get touched by the tests.

rabernat commented 2 years ago

Hi Ian! Thanks for continuing to work on this PR. I have been a bit overwhelmed with getting ready for / participating in OSM for the past two weeks. Next week I intend to spend some time reviewing this PR in detail. (I am still confused about some parts of this and need to take some time to really understand what is happening.) Thanks for your patience.

iangrooms commented 2 years ago

Had some problems related to this https://github.com/psf/black/issues/2964. My changes to the pre-commit configuration can be rolled back once this is fixed.