Closed SebastianSemper closed 3 years ago
I'm pretty against this, though I'll listen to counter arguments.
First of all, what problem are we trying to solve? FilterPy is meant to be mostly a pedagogical resource. People that are trying to learn about Kalman filtering but are not python programmers already complain about the barrier the language imposes. I can't do much about that; my conceit is that it is (can be) easier for programmers to learn math via programming. I haven't heard from one person that FilterPy was too slow. In general, we turn to GPU for large arrays - cupy benchmarks use 800mb arrays. In contrast, a 9x9 array is 'big' for FilterPy.
If the switch could be made truly seamless it might make sense. But it turns out you have to keep track of what device you are on, and they don't mix. This talks about the way to write agnostic code https://docs.cupy.dev/en/stable/tutorial/basic.html I don't want to force that on myself, anyone who submits a patch, or anyone using the code.
Also, cupy is not complete. Maybe I am currently not using anything unsupported. But what about tomorrow? I don't want to be constrained to what cupy supports, which is always going to lag NumPy.
And pragmatically, I am not keeping up with feature requests and bug reports as it is. I have zero interest in supporting something that almost certainly will entail a slowdown of the code for almost everyone who uses it (I assume, w/o proof, that cupy is only performant on large matrices)
Plus, I'll note, that tutorial is tiny. The documentation is extremely sparse. Sample question that pops in my head - what happens when I try
plt.plot([1,2,3,4], y_gpu)
The list is on the cpu, the y is on the gpu. I'm going to guess that is going to fail because I doubt matplotlib was written agnostically. I don't need an answer to this specific question, it's a more general point. People do stuff with this data, and they almost certainly are going to require that data to be on the CPU, and so will incur the cost of that copy. I'd wager given the small size of arrays, any potential speedup in the computation will be lost via the conversion. So hardly anyone would use this anyway.
Apples and oranges - but I've played around with incorporating Numba into FilterPy. It's always slower. We are using tiny arrays that fit well within the AVX/SSE architecture. I don't know, maybe if you were running 10 years of financial data using a 1000 element state array the Numba or cupy version would be better, but I haven't heard from one person doing that. If they are, their best bet is to take the 5 equations needed to implement a KF and slap it into Julia or something.
In any case, I'm not going to look at this. A successful argument would have to be a complete implementation along with tests and benchmarks showing that all of this is worth even thinking about.
I have found when browsing through the code that the it would be possible to transfer the whole package to cupy [1] by allowing to switch numpy for cupy. This would result in the computations being run on CUDA capable GPUs, since cupy aims at reimplementing the numpy interface exactly on the GPU.
Almost all that is needed would be something like
in a backend submodule. Then in code one would simply write
from filterpy.backend import xp
and one could use xp the same as numpy before.Since I have no complete overview of the codebase, one could consult [2] to check it all used and needed functions are available in cupy. If this is of interest I would be happy to provide the necessary code in form of a PR.
[1] https://cupy.dev/ [2] https://docs.cupy.dev/en/stable/reference/comparison.html
EDIT:
Addition to the backend code: