inaos / iron-array

2 stars 0 forks source link

Scalar UDF functions do not work correctly in a number of situations #541

Closed FrancescAlted closed 2 years ago

FrancescAlted commented 2 years ago

Here it is a snipped showing how scalar UDF fails (mainly for float32, but also for some Python idioms):

import numpy as np
import iarray as ia
from iarray import udf

@udf.scalar(lib="lib")
# def fcond(a: udf.float32, b: udf.float32) -> float:  # does not work for arrays with float32
def fcond(a: udf.float64, b: udf.float64) -> float:

    # The next does not work:
    if (a + b) > 3:
        return 1
    else:
        return 0
    # The one below works (for float64)
    # c = 0.
    # if (a + b) > 3:
    #     c = 1.
    # return c
    #
    # return 1 if (a + b) > 3 else 0  # this also works (for float64)

N = 10_000_000
dtype = np.float64  # change here for testing the float32

print("** scalar udf evaluation ...")
a1 = ia.arange([N], dtype=dtype)
a2 = ia.ones([N], dtype=dtype)
expr = ia.expr_from_string("lib.fcond(a, b)", {"a": a1, "b": a2})
b1 = expr.eval()

print("** numpy evaluation ...")
b2 = (a1.data + a2.data) > 3
print(b2)
np.testing.assert_array_equal(b1.data, b2)

It turns out that float32 is an important data type for us, so this has priority. Also, it would be nice if the snipped above could be run for a number of int types (int8, 16, 32 and 64); however, this is not as important for now.