Open itamarst opened 2 months ago
Cython supports this by using fused types, with minimal code duplication. See the example here: https://pythonspeed.com/articles/faster-cython-simd/
It might be useful to document this pattern in the Cython documentation, at minimum. And it could in theory be added as language feature so as to minimize boilerplate.
The most commonly used crate for arrays is ndarray
. It's unclear to me whether it can even generate code that's specifically for contiguous arrays.
In many cases it will also be fine to only support contiguous arrays, and make a copy first when getting non-contiguous arrays (possibly in Python code, before passing it to a function in an extension module). This is a common patterns when using Pythran. The end result is usually better performance on the common case, while still supporting the non-common case.
I'm a little wary of copying as a solution. High memory usage can have a significant impact on computation costs (RAM isn't cheap), and there's the risk of hitting the swapping performance cliff. And it's already super-easy to end up with way-too-high memory usage with explicit APIs.
So adding intermittent, hidden copying of large arrays seems like a bad idea in generic library APIs, at least. In the context of applications rather than libraries, where the author has better understanding of inputs and run time environment, it might be a good solution though.
NumPy views can point at non-contiguous chunks of memory. This means general purpose code needs to be able to accept both contiguous and non-contiguous memory, which means generic code that accepts NumPy arrays will have to assume non-contiguous memory. And this loses out on potential optimizations, in particular automatic usage of SIMD; if the compiler knows the array is contiguous, it can skip a bunch of stride computations and do more optimization.
Contiguous inputs are going to be very common; how common depends on the domain and function. So it would be good to get maximum speed for those.
That means compiling two versions of expensive functions, one for contiguous arrays and one for non-contiguous arrays, and choosing the appropriate one based on inputs. And as a library author I would like to do this with minimum code duplication!
Numba does this automatically, but for most languages this requires changes to the code.