static-frame / arraykit

Python C Extensions for StaticFrame
Other
8 stars 2 forks source link

Implement array2d_to_array1d #22

Closed flexatone closed 4 months ago

brandtbucher commented 3 years ago

This one looks like it could be quite a lot of work (iterators in general require a lot of boilerplate).

Perhaps it could be sped up by just changing the Python implementation to return map(tuple, array)? That covers essentially 90% of what we would be doing (moving the loop from Python to C).

flexatone commented 3 years ago

This is the post I mentioned referencing this issue:

https://wesmckinney.com/blog/performance-quirk-making-a-1d-object-ndarray-of-tuples/

I think in many cases we are going from a 2D array to a 1D array of tuples: that explicit case would likely benefit from a C implementation.

Separately, if map(tuple, array) is faster, that is an easy win!

flexatone commented 3 years ago

@chaburkland : this might be a productive issue to take on.

flexatone commented 1 year ago

It seems that the only additional benefit of a C implementation is that we can know the size of each tuple in advance, and that size is constant. Not sure if it is worth a C implementation.

flexatone commented 1 year ago

Current implementation:

def array2d_to_array1d(array: np.ndarray) -> np.ndarray:
    post: np.ndarray = np.empty(array.shape[0], dtype=object)
    for i, row in enumerate(array):
        post[i] = tuple(row)
    post.flags.writeable = False
    return post

This can out-perform Python in not creating integer PyObjects, as well as being able to pre-size each tuple.