Cupy and other backends

I have found when browsing through the code that the it would be possible to transfer the whole package to cupy [1] by allowing to switch numpy for cupy. This would result in the computations being run on CUDA capable GPUs, since cupy aims at reimplementing the numpy interface exactly on the GPU.

Almost all that is needed would be something like

_BACKEND = os.environ.get("FILTERPY_BACKEND", "numpy")
def _importCPU():
    import numpy as xp
    import scipy as xpx
    import scipy.linalg as xpxl
    import scipy.sparse as xpxs

    return (xp, xpx, xpxl, xpxs)

def _importGPU():
    import cupy as xp
    import cupyx.scipy as xpx
    import cupyx.scipy.linalg as xpxl
    import cupyx.scipy.sparse as xpxs

    return (xp, xpx, xpxl, xpxs)

in a backend submodule. Then in code one would simply write from filterpy.backend import xp and one could use xp the same as numpy before.

Since I have no complete overview of the codebase, one could consult [2] to check it all used and needed functions are available in cupy. If this is of interest I would be happy to provide the necessary code in form of a PR.

[1] https://cupy.dev/ [2] https://docs.cupy.dev/en/stable/reference/comparison.html

EDIT:

Addition to the backend code:

if _BACKEND == "cupy":
    xp, xpx, xpxl, xpxs, xpg = _importGPU()
    try:
        xp, xpx, xpxl, xpxs, xpg = _importGPU()
        asNumpy = _asNumpyGPU
    except ImportError:
        raise RuntimeWarning("cupy not importable. using numpy.")
        xp, xpx, xpxl, xpxs, xpg = _importCPU()
        asNumpy = _asNumpyCPU
elif _BACKEND == "numpy":
    xp, xpx, xpxl, xpxs, xpg = _importCPU()
    asNumpy = _asNumpyCPU
else:
    raise RuntimeWarning("unknown backend. using numpy.")
    xp, xpx, xpxl, xpxs, xpg = _importCPU()
    asNumpy = _asNumpyCPU

I'm pretty against this, though I'll listen to counter arguments.

First of all, what problem are we trying to solve? FilterPy is meant to be mostly a pedagogical resource. People that are trying to learn about Kalman filtering but are not python programmers already complain about the barrier the language imposes. I can't do much about that; my conceit is that it is (can be) easier for programmers to learn math via programming. I haven't heard from one person that FilterPy was too slow. In general, we turn to GPU for large arrays - cupy benchmarks use 800mb arrays. In contrast, a 9x9 array is 'big' for FilterPy.

If the switch could be made truly seamless it might make sense. But it turns out you have to keep track of what device you are on, and they don't mix. This talks about the way to write agnostic code https://docs.cupy.dev/en/stable/tutorial/basic.html I don't want to force that on myself, anyone who submits a patch, or anyone using the code.

Also, cupy is not complete. Maybe I am currently not using anything unsupported. But what about tomorrow? I don't want to be constrained to what cupy supports, which is always going to lag NumPy.

And pragmatically, I am not keeping up with feature requests and bug reports as it is. I have zero interest in supporting something that almost certainly will entail a slowdown of the code for almost everyone who uses it (I assume, w/o proof, that cupy is only performant on large matrices)

Plus, I'll note, that tutorial is tiny. The documentation is extremely sparse. Sample question that pops in my head - what happens when I try

plt.plot([1,2,3,4], y_gpu)

The list is on the cpu, the y is on the gpu. I'm going to guess that is going to fail because I doubt matplotlib was written agnostically. I don't need an answer to this specific question, it's a more general point. People do stuff with this data, and they almost certainly are going to require that data to be on the CPU, and so will incur the cost of that copy. I'd wager given the small size of arrays, any potential speedup in the computation will be lost via the conversion. So hardly anyone would use this anyway.

Apples and oranges - but I've played around with incorporating Numba into FilterPy. It's always slower. We are using tiny arrays that fit well within the AVX/SSE architecture. I don't know, maybe if you were running 10 years of financial data using a 1000 element state array the Numba or cupy version would be better, but I haven't heard from one person doing that. If they are, their best bet is to take the 5 equations needed to implement a KF and slap it into Julia or something.

In any case, I'm not going to look at this. A successful argument would have to be a complete implementation along with tests and benchmarks showing that all of this is worth even thinking about.

rlabbe / filterpy

Cupy and other backends #215