Attempt to fix `threadsafe` on `skin`

jbusecke commented 2 years ago

[ ] Closes #38

As a first step I wanted to verify that the problem is actually related to threadsafe. And indeed when I remove the threadsafe from the 'skin' wrapper only, all tests pass locally.

So there must be something specific to the skin correction that is not threadsafe?

jbusecke commented 2 years ago

Ok so running the tests from a script in the tests directory

from test_flux_xr import Test_xarray

test_class = Test_xarray()

shape = (10, 13, 12)
chunks = {"dim_0": 1}
skin_correction = True
test_class.test_chunked(shape, chunks, skin_correction)

gives me these sort of errors:

 ===================================================================
                    ----- AeroBulk_init -----

     *** Bulk parameterization to be used => "coare3p0"
        ==> will use the Cool-skin & Warm-ayer scheme of `coare3p0` !
     *** Computational domain shape: Ni x Nj = 00001 x 00013
     *** Number of time records that will be treated:          12
     *** Number of iterations in bulk algos: nb_iter  =    6
     *** Filling the `mask` array...
         ==> no points need to be masked! :)
     *** Type of prescribed air humidity  `relative humidity [%]`
 ===================================================================
 *** E R R O R :
  COARE3P0_INIT => allocation of Tau_ac, Qnt_ac, dT_wl & Hz_wl failed!

switching the algo shows a similar thing for ecmwf:

 *** E R R O R :
  ECMWF_INIT => allocation of dT_wl & Hz_wl failed!

These errors go back to the aerobulk code here and here, both parts of the codebase seem to allocate some internal optional output variable.

Does anyone have a hint to why this would not sit well with threadsafe (cc @rabernat)?

jbusecke commented 2 years ago

This might be relevant: https://stackoverflow.com/questions/34579769/f2py-error-with-allocatable-arrays

jbusecke commented 2 years ago

Quick update. This only applies when using dask arrays, interestingly. The tests with the numpy only wrappers in tests/test_fortran.py all pass!

rabernat commented 2 years ago

Is that because Dask uses threads?

Also, what is different about skin that makes it not thread safe vs. noskin.

jbusecke commented 2 years ago

Sorry the previous comment was circular: As far as I can tell the difference is in the allocation of optional output which is only done when chosing the skin correction option (there are slight differences between the actual algos but the failure happens at the same ALLOCATE call in both, see below:

These errors go back to the aerobulk code here and here, both parts of the codebase seem to allocate some internal optional output variable.

jbusecke commented 2 years ago

Is that because Dask uses threads?

Any of my dask test cases has used the threaded scheduler (default). Before setting threadsafe, this worked for either skin/noskin but was very slow (basically running in serial). After setting threadsafe the noskin wrapper parallelizes as expected but the skin version actually FAILS.

paigem commented 2 years ago

Note that once #40 is merged, the skin code is no longer "threadsafe". The "threadsafe" option was removed in that PR to allow for tests to pass.

jbusecke commented 2 years ago

Ill close this for now, since we have decided to move on without threadsafe for the skin option for now. We can reopen if needed

xgcm / aerobulk-python

Attempt to fix `threadsafe` on `skin` #39