Using Pythran to implement the compiled kernel of a NumPy ufunc

steppi commented 1 year ago

Hi,

I've been investigating the possibility of using Pythran to implement the compiled kernels for ufuncs in SciPy's special submodule. In special, SciPy vendors several compiled special function libraries, one in particular, an F77 library specfun.f written for the book "Computation of Special Functions", 1996, John Wiley & Sons, Inc. by Shanjie Zhang and Jianming Jin, is problematic due to a high number of correctness bugs. Rather than trying to debug and fix the F77 code, I think it would be better to rewrite the offending functions in a modern language. I've already begun this effort for the Gaussian hypergeometric function hyp2f1 using Cython.

Being able to write the ufunc kernel in Pythran seems like it could be a good option, since there's a shortage of bandwidth for maintainers and contributors with simultaneous experience in both the relevant mathematics and in programming with compiled languages. At the moment Pythran is an optional dependency, but there is talk of making it required https://github.com/scipy/scipy/issues/18683, after which it might become viable to start using Pythran to generate compiled code for use in other compiled code in SciPy.

I've created a toy example of a ufunc where the compiled kernel was implemented with Pythran here, https://github.com/steppi/pythran_ufunc_example. I'd be curious if you have any feedback on whether the approach I've used is viable, or whether you anticipate it might be prone to breaking upon updates to Pythran.

In this example I create a capsule function for the exponential integral E1, by directly translating the Fortran code in specfun.f.

The below snippet is from https://github.com/steppi/pythran_ufunc_example/blob/main/ufunc_example/_exp1.py


import numpy as np

# pythran export capsule _exp1(float64)
def _exp1(x):
    EPS = np.finfo('float').eps
    GA = 0.5772156649015328
    if x == 0.0:
        return float("inf")
    if x <= 1:
        result = 1.
        term = 1.
        for k in range(1, 25):
            term = -term*k*x/(k + 1)**2
            result += term
            if abs(term) <= abs(result) * EPS:
                break
        return -GA - np.log(x) + x * result
    M = int(20 + np.trunc(80 / x))
    t0 = 0.0
    for k in range(M, 0, -1):
        t0 = k / (1 + k/(x + t0))
    t = 1 / (x + t0)
    return np.exp(-x) * t

I compiled this using the pythran -e option to generate a cpp file, and then wrapped the resulting function using the numpy ufunc machinery in a small Cython file. (There was no real reason to use Cython here, the ufunc wrapper could have just as easily been written using C++ instead of Cython, it's just that I had copied the recipe from the autogenerated Cython SciPy uses for ufuncs in special.)

This Pythran generated implementation seems to work well, with performance that is virtual identical to the original Fortran implementation. What I've done seems like it could be very brittle though. Am I relying on implementation details of the generated C++ code? Is there a supported way to generate a C++ function which can be wrapped in the ufunc machinery like this which can be trusted to work as Pythran updates?

I've seen https://github.com/serge-sans-paille/pythran/issues/1953, and the recommended blog post https://serge-sans-paille.github.io/pythran-stories/pythran-as-a-bridge-between-fast-prototyping-and-code-deployment.html, but I wasn't able to get my example to work without making the Pythran function a capsule, which is a little different from what's in the blog post.

Also if there would be any interest in supporting ufunc generation machinery directly in Pythran, so that something like

# pythran export ufunc _exp1(float64)

could just work out of the box, I'd be happy to help with this, although that would be further on the time horizon. At the moment I'm just interested in seeing if there's a supported way to incorporate C++ functions generated with Pythran into the ufunc machinery.

serge-sans-paille commented 1 year ago

Hey @steppi and thanks for the very detailed post. I'm very excited by your experiment :-)

@jeanlaroche is already using the output of -e to integrate pythran-generated C++ code within a native codebase without any glimpse of python, but I have to say that there is no strong guarantee over some of the details. Scipy being a first-class user of Pythran, I'm all in for the export ufunc syntax, could you explain in more detail what that would imply?

steppi commented 1 year ago

Scipy being a first-class user of Pythran, I'm all in for the export ufunc syntax, could you explain in more detail what that would imply?

Sure. So it's going to involve automatically generating the ufunc boilerplate. This tutorial explains how to write the boilerplate manually.

The ufunc syntax in Pythran could be applied to any function which takes one or more scalar numeric arguments and returns a scalar numeric output. One could then automatically generate vectorized ufuncs, which can operate on both scalars and arbitrary size arrays, and allow broadcasting.

Technically, there are also ufuncs which can return multiple values, but those aren't used in scipy.special, so I think it would be fine to not support those, at least at first, since it seems like it would introduce a lot of complexity.

Other projects already generate this boilerplate automatically, so I think it's a solved problem, but it may be a little tedious to get all of the details right.

In scipy.special, the ufunc boilerplate is generated by https://github.com/scipy/scipy/blob/main/scipy/special/_generate_pyx.py, from compiled kernel functions specified in the config file https://github.com/scipy/scipy/blob/main/scipy/special/functions.json.

As of version 3.x, Cython now offers a @ufunc decorator which also automatically generates the ufunc boilerplate here https://github.com/cython/cython/blob/master/Cython/Compiler/UFuncs.py.

Numba also offers a @vectorize decorator which also automatically generates the ufunc boilerplate, https://github.com/numba/numba/tree/main/numba/np/ufunc.

One thing to consider, is that ufuncs can employ dtype dispatching to find the appropriate implementation based on input type, so one can have separate implementations for different types. We'd have to think of a good syntax to express this in Pythran. Would it make sense to allow multiple definitions of the same function in the same file, all with the ufunc syntax, but differing in the input types?

Let me know if you have any questions. My background is in scientific and statistical computing not compilers, but I'd be willing to do whatever I can to help push this through, and my colleague @tirthasheshpatel would likely be able to help as well.

serge-sans-paille commented 1 year ago

The tutorial is crystal, that should put any difficulty to integrate that to pythran. Let's discuss the export syntax then.

This is fine if we only need one dtype:

# pythran export ufunc _exp1(float64)

But what about multiple dtype? One could try:

# pythran export ufunc _exp1(float64)
# pythran export ufunc _exp1(float32)
# pythran export ufunc _exp1(complex128)

but that's a bit awkward, as it looks we're exporting several symbols while we're only exporting one. Maybe

# pythran export ufunc _exp1 from _exp1(float64), exp1(float32), _expc1(complex128)

That way if we have a different implementation for a class of data type we can still put everything together (in the example above, I've used _expc for the complex implementation).

@rgommers , @tirthasheshpatel , any thoughts on this?

steppi commented 1 year ago

I like

# pythran export ufunc _exp1 from _exp1(float64), exp1(float32), _expc1(complex128)

that's a really good point about the other option making it look like we're exporting several symbols when it's actually just one.

serge-sans-paille commented 1 year ago

An alternative would be to support numpy.vectorize from pythran, that way we would not introduce new export syntax and support one more numpy construct.

steppi commented 1 year ago

An alternative would be to support numpy.vectorize from pythran, that way we would not introduce new export syntax and support one more numpy construct.

and then the dtype dispatching logic would go into the function itself? It seems like that could get tricky, but I might be wrong.

serge-sans-paille commented 1 year ago

well, pythran turns a python function into a generic C++ function, so we just have to instantiate it for every dtype required by the user.

This is a bit convoluted, but pythran already support this kind of code:

import numpy as np

def fooi(n):
    return n

def foof(n):
    return int(n + .5)

# pythran export foo(int)
def foo(n):
    s = n
    for dtype in (np.int32, np.float32, np.float32):
        if type(dtype()) is np.int32:  # I should add support for if dtype is np.int32
            s += fooi(n)
        else:
            s += foof(dtype(n))
    return s

And the type checks are done at compile time.

steppi commented 1 year ago

well, pythran turns a python function into a generic C++ function, so we just have to instantiate it for every dtype required by the user.

This is a bit convoluted, but pythran already support this kind of code:
import numpy as np

def fooi(n):
    return n

def foof(n):
    return int(n + .5)

# pythran export foo(int)
def foo(n):
    s = n
    for dtype in (np.int32, np.float32, np.float32):
        if type(dtype()) is np.int32:  # I should add support for if dtype is np.int32
            s += fooi(n)
        else:
            s += foof(dtype(n))
    return s
And the type checks are done at compile time.

That's really cool that that example works with the type checks done at compile time. It seems like the type casting could get convoluted when there are multiple arguments though. What would this look like for the case of say the gaussian hypergeometric function hyp2f1? where the signatures I want to support are:

hyp2f1(np.float32, np.float32, np.float32, np.float32)
hyp2f1(np.float32, np.float32, np.float32, np.complex64)
hyp2f1(np.float64, np.float64, np.float64, np.float64)
hyp2f1(np.float64, np.float64, np.float64, np.complex128)

supporting np.vectorize would be awesome, but it would be nice if I could also have a way to let the ufunc machinery take care of the type casting, and only have to supply a handful of implementations for the desired signatures.

serge-sans-paille commented 1 year ago

For the record: I'm slowly crafting a generic C++ header to automatically turn a scalar function into a ufunc. On my way!

serge-sans-paille commented 5 months ago

@steppi : once #2199 is finished, this would look like

#pythran export ufunc hyp2f1(float32, float32, float32, float32)
#pythran export ufunc hyp2f1(float32, float32, float32, complex64)
#pythran export ufunc hyp2f1(float64, float64, float64, float64)
#pythran export ufunc hyp2f1(float64, float64, float64, complex128)

serge-sans-paille / pythran

Using Pythran to implement the compiled kernel of a NumPy ufunc #2141