pytroll / pyresample

Geospatial image resampling in Python
http://pyresample.readthedocs.org
GNU Lesser General Public License v3.0
346 stars 94 forks source link

Fix intermittent EWA test failures #482

Closed djhoese closed 1 year ago

djhoese commented 1 year ago

Hopefully ends up resolving #481 or at least parts of it.

djhoese commented 1 year ago

Wow I didn't even notice that I finally got it to fail. Here's the bottom of the test log:

 FAILED pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_xarray_basic_ewa[10-True-float32-input_shape0-input_dims0-LegacyDaskEWAResampler-pyresample.ewa._legacy_dask_ewa] - AssertionError: assert 100 == (51 + 50)
 +  where 100 = <MagicMock name='ll2cr' id='1459486493952'>.call_count
==== 1 failed, 753 passed, 22 skipped, 1004 warnings in 257.70s (0:04:17) =====

And a slightly larger version of that traceback which is further up in the log:


            # ll2cr will be called once more because of the computation
>           assert ll2cr.call_count == ll2cr_calls + num_chunks
E           AssertionError: assert 100 == (51 + 50)
E            +  where 100 = <MagicMock name='ll2cr' id='1459486493952'>.call_count
djhoese commented 1 year ago

I added some print statements and here's what I get:

 ll2cr call count before resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=0
ll2cr call count after resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=1
ll2cr call count after first compute call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=51
ll2cr call count before second resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=51
ll2cr call count after second resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=51
ll2cr call count after second compute call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=100

One thing to note is this is the other shape variant from the failure I posted above (the 2D array case). I guess that makes sense the failure is in the ll2cr portion which is based on the 2D lon/lat arrays, not the 2D or 3D data.

The other thing to note here is that the number of calls from the second time the compute call is done there should be 50 more calls but there are only 49. This is very very odd. This makes me think either dask is deciding that two of the calls are the same (equivalent inputs so just reuse the outputs). For example ll2cr on chunks (0, 0) and chunks (2, 2) are hashing to the same inputs...wait dask shouldn't care per-chunk as it organizes/hashes things for the entire array and then assigns chunk indexes (ex. (my-example-func-ABCDEF12345, 0, 0)).

My other guess is some weird thing like a race condition if mock isn't thread-safe, especially on Windows apparently, and we're missing an increment of the mock object because two executions are trying to increment at the same time.

djhoese commented 1 year ago

I think I'm at 5 or so restarts so far and it hasn't failed yet. I'll try for 10 or 15 total.

djhoese commented 1 year ago

Great...still failing:

 FAILED pyresample/test/test_dask_ewa.py::TestDaskEWAResampler::test_xarray_basic_ewa[100-True-float64-input_shape1-input_dims1-LegacyDaskEWAResampler-pyresample.ewa._legacy_dask_ewa] - AssertionError: assert 99 == (50 + 50)
 +  where 99 = <MagicMock name='ll2cr' id='2646658360336'>.call_count

Debug prints:

 ll2cr call count before resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=0
ll2cr call count after resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=0
ll2cr call count after first compute call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=50
ll2cr call count before second resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=50
ll2cr call count after second resample call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=50
ll2cr call count after second compute call (<class 'pyresample.ewa._legacy_dask_ewa.LegacyDaskEWAResampler'> | input_shape=(3, 100, 50)): ll2cr.call_count=99

So for some reason, again mostly on Windows, one of the chunks is not being called.

djhoese commented 1 year ago

It took almost 50 restarts to get it to fail again:

 >           assert ll2cr.call_count == ll2cr_calls + num_chunks
E           AssertionError: assert 99 == (50 + 50)
E            +  where 99 = <MagicMock name='ll2cr' id='2367904042528'>.call_count

I added some print statements so lets see if I can decipher some more information from them.

djhoese commented 1 year ago

Ok so in my most recent commit I had the ll2cr map_blocks function print out chunk information. I think took that output from the failed test, put it in python, matched a regular expression against the pattern chunk_location': (\d, \d, \d), and got 50 results during the final (second) compute of the test. That tells me the function is being called the proper number of times but something about the multi-threaded nature is making mock unhappy.

djhoese commented 1 year ago

Ok so mocks are not technically thread safe. The way we use them is generally OK for basic "replace this functionality with this functionality", but in these EWA-specific tests I'm checking call count and that is not thread-safe. I'm changed to a synchronous dask scheduler for these tests which should fix this. I'll rerun the tests 10 or so times and try to break it then remove any remaining changes/prints in this PR and merge it.

codecov[bot] commented 1 year ago

Codecov Report

Merging #482 (0745ddd) into main (53ae8a9) will decrease coverage by 0.00%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #482      +/-   ##
==========================================
- Coverage   94.32%   94.32%   -0.01%     
==========================================
  Files          74       74              
  Lines       12890    12889       -1     
==========================================
- Hits        12159    12158       -1     
  Misses        731      731              
Flag Coverage Δ
unittests 94.32% <100.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyresample/ewa/_legacy_dask_ewa.py 84.69% <100.00%> (ø)
pyresample/ewa/dask_ewa.py 90.08% <100.00%> (-0.09%) :arrow_down:
pyresample/test/test_dask_ewa.py 98.95% <100.00%> (+<0.01%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

coveralls commented 1 year ago

Coverage Status

Coverage decreased (-0.0005%) to 93.85% when pulling 0745ddd9ab6dbe5fa7aa20b5b5b54b127704653d on djhoese:debug-ewa-ci-failure into 53ae8a9ab0747e5d4291ad9dadd9d5cc942b9560 on pytroll:main.