xgcm / aerobulk-python

A python wrapper for aerobulk (https://github.com/brodeau/aerobulk)
GNU General Public License v3.0
14 stars 4 forks source link

Data array input for `zt` and `zu` works, but does it do the right thing? #81

Open jbusecke opened 3 months ago

jbusecke commented 3 months ago

I just ran this example:

import xarray as xr
from aerobulk.flux import noskin_np, skin_np, noskin, skin

def wrap_da(input:np.ndarray) -> xr.DataArray:
    return xr.DataArray([np.float64(i) for i in input], dims={'time':np.arange(len(input))})

sst = wrap_da([270, 270])
t_zt = wrap_da([190, 190])                           
hum_zt = wrap_da([0.05, 0.05])
u_zu = wrap_da([50.0, 50.0])                      
v_zu = wrap_da([50.0, 50.0])
slp = wrap_da([110000, 110000])
zt = wrap_da([10, 1])                              
zu = wrap_da([1, 10])

algo = 'coare3p6'
ql, qh, taux, tauy, evap = noskin(
    sst=sst,
    t_zt=t_zt, 
    hum_zt=hum_zt, 
    u_zu=u_zu, 
    v_zu=v_zu, 
    slp=slp, 
    algo=algo, 
    zt=zt, 
    zu=zu, 
)

Which worked, but I am suspicious of how multiple values for zt and zu are used. This behavior is not documented since we say the only allowed int types for those inputs. From the above we get two output values that are identical (here for the latent heatflux as example)

image

This seems fishy to start with, as I would expect the vastly different values for the height to produce different outputs (when all other inputs are kept identical).

Indeed when running some further tests:

# evaluate each point separately with integer input for height values
ql_0, qh_0, tau_0, tauy_0, evap_0 = noskin(
    sst=sst,
    t_zt=t_zt, 
    hum_zt=hum_zt, 
    u_zu=u_zu, 
    v_zu=v_zu, 
    slp=slp, 
    algo=algo, 
    zt=10, 
    zu=1, 
    niter=10, 
    input_range_check=True
)

ql_1, qh_1, tau_1, tauy_1, evap_1 = noskin(
    sst=sst,
    t_zt=t_zt, 
    hum_zt=hum_zt, 
    u_zu=u_zu, 
    v_zu=v_zu, 
    slp=slp, 
    algo=algo, 
    zt=1, 
    zu=10, 
    niter=10, 
    input_range_check=True
)

We actually get different results for the different height inputs

image

What seems to happen is that in the original case something in our code is silently taking the first value of zt and zu and applying that to all data.

For context this was brought up in the context of #79 by @juliasimpson97 who wants to use aerobulk-python for observational data. This data can be described as a timeseries of inputs where the height of measurement is also varying in time (hence is provided as a dataarray of equal length).

I think that for that specific use case, we might possibly run up agains the way aerobulk (the fortran code) is set up - it is very much designed for model datasets where the height for a large 2/3 d array is going to be a constant.

A short solution for @juliasimpson97 could be to loop over all the values, calculate them one by one, and concatenate the results. This will be slow, but the other way of doing this seems to produce results that are incorrect.

In the longer run we need to see if there is a way to replicate this solution as part of the .apply_ufunc call and thus allow xr.Dataarray input for zt/zu. If that is not possible/desired we might consider at least warning the user about this and adding an explicit check for those inputs.

julialsimpson commented 3 months ago

Looking at the above, are you sure that you're calling different values to create ql_0, qh_0, tau_0, tauy_0, evap_0 as opposed to ql_1, qh_1, tau_1, tauy_1, evap_1? it looks like sst/all the inputs are unchanged?

jbusecke commented 3 months ago

Looking at the above, are you sure that you're calling different values to create ql_0, qh_0, tau_0, tauy_0, evap_0 as opposed to ql_1, qh_1, tau_1, tauy_1, evap_1? it looks like sst/all the inputs are unchanged?

Yes that is intentional. In the first example both datapoints only differ in the height input, and return the exact same value.

The other test cases show that these different height values should actually produce different values (compare q1 to q0). I admit, its a bit confusing that both return two values each, but basically just compare the first value of q0 with the first value of q1, those are the two (distinct) values I would have expected from the output of the initial code.