aerobulk-python is SLOWWW

jbusecke commented 2 years ago

I am currently struggling with what I perceive as very slow performance with aerobulk-python.

For some of the work we are doing over at https://github.com/ocean-transport/scale-aware-air-sea, it seems that a single timestep of CM2.6 data takes about 45-50s to execute.

This does not seem to be a particularity of the data wer are using, since I am getting similar times for synthetic data:

from typing import Dict, Tuple
import time
import numpy as np
import pytest
import xarray as xr
from aerobulk import noskin

"""Tests for the xarray wrapper"""

def create_data(
    shape: Tuple[int, ...],
    chunks: Dict[str, int] = {},
    skin_correction: bool = False,
    order: str = "F",
):
    def _arr(value, chunks, order):
        arr = xr.DataArray(np.full(shape, value, order=order))

        # adds random noise scaled by a percentage of the value
        randomize_factor = 0.001
        randomize_range = value * randomize_factor
        arr = arr + np.random.rand(*shape) + randomize_range
        if chunks:
            arr = arr.chunk(chunks)
        return arr

    sst = _arr(290.0, chunks, order)
    t_zt = _arr(280.0, chunks, order)
    hum_zt = _arr(0.001, chunks, order)
    u_zu = _arr(1.0, chunks, order)
    v_zu = _arr(-1.0, chunks, order)
    slp = _arr(101000.0, chunks, order)
    rad_sw = _arr(0.000001, chunks, order)
    rad_lw = _arr(350, chunks, order)
    if skin_correction:
        return sst, t_zt, hum_zt, u_zu, v_zu, rad_sw, rad_lw, slp
    else:
        return sst, t_zt, hum_zt, u_zu, v_zu, slp

data = create_data((3600, 2700, 2), chunks=None)

When I time this execution

tic = time.time()
out_data = noskin(*data)
toc = time.time() - tic

I get a runtime of ~100sec, so about 50s per timestep.

Am I expecting too much of the fortran code? Or is this slow for computing on ~1e7 data points.

I wonder if there is some obvious issue with our compiler flags?

rabernat commented 2 years ago

I would make a plot of how long the execution time takes as a function of domain size. Just 4-5 points should be enough to see whether it is scaling linearly. If so, we can focus on optimizing with just a small piece of data.

jbusecke commented 2 years ago

Good idea! Ill do that in a bit.

jbusecke commented 2 years ago

For results see here: https://github.com/ocean-transport/scale-aware-air-sea/issues/28#issuecomment-1158136723

So from this I conclude that we can optimize on a small domain size e.g. 100x100.

jbusecke commented 2 years ago

Some relief was brought by #40 (see benchmarks there).

xgcm / aerobulk-python

aerobulk-python is SLOWWW #32