Closed flexatone closed 4 months ago
This is the post I mentioned referencing this issue:
https://wesmckinney.com/blog/performance-quirk-making-a-1d-object-ndarray-of-tuples/
I think in many cases we are going from a 2D array to a 1D array of tuples: that explicit case would likely benefit from a C implementation.
Separately, if map(tuple, array)
is faster, that is an easy win!
@chaburkland : this might be a productive issue to take on.
It seems that the only additional benefit of a C implementation is that we can know the size of each tuple in advance, and that size is constant. Not sure if it is worth a C implementation.
Current implementation:
def array2d_to_array1d(array: np.ndarray) -> np.ndarray:
post: np.ndarray = np.empty(array.shape[0], dtype=object)
for i, row in enumerate(array):
post[i] = tuple(row)
post.flags.writeable = False
return post
This can out-perform Python in not creating integer PyObjects, as well as being able to pre-size each tuple.
This one looks like it could be quite a lot of work (iterators in general require a lot of boilerplate).
Perhaps it could be sped up by just changing the Python implementation to
return map(tuple, array)
? That covers essentially 90% of what we would be doing (moving the loop from Python to C).