pelson / pyudunits2

A pure Python library designed for handling units of physical quantities, fully based on the UDUNITS2 grammar and XML database
Apache License 2.0
5 stars 1 forks source link

Performance with cf-units #2

Open pelson opened 1 week ago

pelson commented 1 week ago

A quick glance suggests that the generated converter from pyudunits2 (which uses sympy lambdify) is significantly quicker than the one that is generated by udunits2 (used within cf-units). However, this performance doesn't shine until you have a lot of data to convert (e.g. 5000250012 data points). Before that, the cost of reading the XML, parsing, etc. is much higher in pyudunits2. It would be good to micro-benchmark this so that we can focus on speeding things up at pinch points.

I had the following scripted hacked together to roughly compare:

import numpy as np
import timeit

def prepare_cf_units():
    import cf_units

    u_from = cf_units.Unit('degC')
    u_to = cf_units.Unit('K')

    def convert(data):
        return u_from.convert(data, u_to)

    return convert

def prepare_pyudunits():
    from pyudunits2._udunits2_xml_parser import read_all

    from pyudunits2._unit import Converter

    unit_system = read_all()

    def convert_w_pyudunits(data):
        u_from = unit_system.unit('degC')
        u_to = unit_system.unit('K')
        converter = Converter(u_from, u_to)
        return converter.convert(data)

    return convert_w_pyudunits

def prepare_data():
    data = np.arange(50 * 25 * 20)
    return data

if __name__ == "__main__":

    print('cf-units:', timeit.repeat(
        "convert_w_cf_units(data)",
        "from __main__ import prepare_cf_units, prepare_data; convert_w_cf_units = prepare_cf_units(); data = prepare_data()",
        repeat=3, number=2,
    ))

    print('pyudunits2:', timeit.repeat(
        "convert_w_pyudunits(data)",
        "from __main__ import prepare_pyudunits, prepare_data; convert_w_pyudunits = prepare_pyudunits(); data = prepare_data()",
        repeat=3, number=2,
    ))

With results along the lines of:

cf-units: [0.0002636730205267668, 0.0002245799987576902, 0.00021556997671723366]
pyudunits2: [0.34791191801195964, 0.04072356101823971, 0.04109554295428097]