xgcm / aerobulk-python

A python wrapper for aerobulk (https://github.com/brodeau/aerobulk)
GNU General Public License v3.0
14 stars 4 forks source link

Handle land mask #40

Closed paigem closed 2 years ago

paigem commented 2 years ago

This PR does the following:

Additionally:

Closes #31

jbusecke commented 2 years ago

It might be helpful if @rabernat could look at the 'shrink'/'unshrink' logic and see if there are performance improvements that we have not considered?

Maybe we should profile the memory use of this?

paigem commented 2 years ago

I did some quick profiling to compare runtime comparing with and without the mask. This preliminary comparison shows a speed up of about 20% by masking land values.

This is not a complete comparison. The following assumptions are made:

Snakeviz visualizations

No mask, CM2.6-sized test data:

data = create_data((3600, 2700, 2), chunks=None)
%snakeviz out_data = noskin_nomask(*data, 'ecmwf', 2, 10, 6)
image

We can see that the numpy wrapper noskin_np() takes essentially the entire runtime.

With mask, CM2.6-sized test data

data = create_data((3600, 2700, 2), chunks=None, land_mask=True)
%snakeviz out_data = noskin(*data, 'ecmwf', 2, 10, 6)
image

The small rectangles in the bottom right are the extra work needed to shrink and unshrink the arrays. Even with these extra computations, the runtime is noticeably shorter.

Runtime comparison

# Take average runtime across 3 runs

# No mask
nomask_avg = np.mean((60.467731952667236,61.46433997154236,61.295777797698975))

# Mask
mask_avg = np.mean((49.637826919555664,46.95607113838196,48.179039001464844))

total_percent_faster = (nomask_avg - mask_avg)/nomask_avg
total_percent_faster

Result: 0.2098748237283278 --> ~20% faster!

As @jbusecke stated above, we may also want to profile memory usage.

jbusecke commented 2 years ago

Thats amazing @paigem! Thanks for the nice comparison.

jbusecke commented 2 years ago

I had one more thing, that might be useful to change here. I believe now that we are converting the input to a 1d array in any case, we can get rid of this wrapper entirely.

To check that this is correct, I would write a test that puts arrays of various dimensions (1-4D?) through the xarray wrappers and make sure that this does not lead to crashes.

paigem commented 2 years ago

@jbusecke Take a look at the new test I wrote in this commit to verify that our xarray wrapper takes arrays of size 2d and greater (i.e. no longer just 3d!). It seems to work fine, but I wasn't sure how to write a test for that...

jbusecke commented 2 years ago

I made a few minor suggestions. If you agree with those, you can commit and merge once the tests pass (make sure to double check that the % line goes to 100 in the CI log).