Performance with cf-units

pelson commented 1 month ago

A quick glance suggests that the generated converter from pyudunits2 (which uses sympy lambdify) is significantly quicker than the one that is generated by udunits2 (used within cf-units). However, this performance doesn't shine until you have a lot of data to convert (e.g. 5000*2500*12 data points). Before that, the cost of reading the XML, parsing, etc. is much higher in pyudunits2. It would be good to micro-benchmark this so that we can focus on speeding things up at pinch points.

I had the following scripted hacked together to roughly compare:

import numpy as np
import timeit

def prepare_cf_units():
    import cf_units

    u_from = cf_units.Unit('degC')
    u_to = cf_units.Unit('K')

    def convert(data):
        return u_from.convert(data, u_to)

    return convert

def prepare_pyudunits():
    from pyudunits2._udunits2_xml_parser import read_all

    from pyudunits2._unit import Converter

    unit_system = read_all()

    def convert_w_pyudunits(data):
        u_from = unit_system.unit('degC')
        u_to = unit_system.unit('K')
        converter = Converter(u_from, u_to)
        return converter.convert(data)

    return convert_w_pyudunits

def prepare_data():
    data = np.arange(50 * 25 * 20)
    return data

if __name__ == "__main__":

    print('cf-units:', timeit.repeat(
        "convert_w_cf_units(data)",
        "from __main__ import prepare_cf_units, prepare_data; convert_w_cf_units = prepare_cf_units(); data = prepare_data()",
        repeat=3, number=2,
    ))

    print('pyudunits2:', timeit.repeat(
        "convert_w_pyudunits(data)",
        "from __main__ import prepare_pyudunits, prepare_data; convert_w_pyudunits = prepare_pyudunits(); data = prepare_data()",
        repeat=3, number=2,
    ))

With results along the lines of:

cf-units: [0.0002636730205267668, 0.0002245799987576902, 0.00021556997671723366]
pyudunits2: [0.34791191801195964, 0.04072356101823971, 0.04109554295428097]

pelson commented 2 weeks ago

@ocefpaf highlighted some performance issues in https://github.com/ioos/compliance-checker/pull/1118#issuecomment-2462095271. I don't really think the magnitude there is representative of the performance of pyudunuts2 (given the entire test suite of pyudunits2 runs in <2s), but it would be good to both get the aforementioned metrics, and to track down where the performance penalties are coming from in the checker (and whether these are a result of some of the workarounds due to missing pyudunits2 features)

ocefpaf commented 2 weeks ago

I'm pretty sure we do things in the least optimized way possible in compliance-checker, so please take those number with a grain of salt. Yet, when using cf-units, the tests run super fast. I'll try to debug this further to figure out where the hiccups are.

pelson / pyudunits2

Performance with cf-units #2