pymc-devs / pytensor

PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.
https://pytensor.readthedocs.io
Other
370 stars 109 forks source link

Add numpy-like helper `hstack` #585

Open ricardoV94 opened 10 months ago

ricardoV94 commented 10 months ago

Description

https://numpy.org/doc/stable/reference/generated/numpy.hstack.html

Others like vstack, column_stack and so one are probably missing as well. First step is to list all of them and confirm they are missing in pytensor.tensor

This should not require a new Op, just repurpose existing concatenate and sprinke expand_dims / atleast_nd as needed

HarshvirSandhu commented 10 months ago

I would like to work on this issue

ricardoV94 commented 10 months ago

I would like to work on this issue

Thanks. Let us know if you have any questions

HarshvirSandhu commented 10 months ago

@ricardoV94 I noticed that functions such as pt.horizontal_stack and pt.vertical_stack are already present, will also implement column_stack and dstack

Also, just want to confirm, do arguments of these functions still need to have at least 2 dimensions? (please see below comments for reference). https://github.com/pymc-devs/pytensor/blob/082081ae5f864e622551a70fe164822b4bef064c/pytensor/tensor/basic.py#L2761-L2768

ricardoV94 commented 10 months ago

That comment is +15 years old. Maybe numpy is less crazy these days? Unless there is some technical reason why we can't replicate numpy behavior (e.g., it depends on static shapes), we should stick to whatever numpy does exactly.

HarshvirSandhu commented 10 months ago

@ricardoV94 I observed unexpected behaviour while using np.column_stack with 1-D arrays.

Consider the below code:

a = np.array([0.1, 0.2, 0.3], dtype="float32") # 1D array
b = np.array([0.7, 0.8, 0.9], dtype="float32") # 1D array
print(np.column_stack([a,b]).shape) # prints (3, 2)

np.column_stack gives a shape of (3,2) whereas np.hstack gives the shape (6,) For 2D shapes it works the same as hstack

a = np.array([[0.1, 0.2, 0.3]], dtype="float32") # 2D array
b = np.array([[0.7, 0.8, 0.9]], dtype="float32") # 2D array
print(np.column_stack([av,bv]).shape) # prints (1, 6)

np.column_stack transposes 1D arrays before stacking them horizontally.

Should we still stick to what numpy does?

ricardoV94 commented 10 months ago

Seems well documented so we should stick with what they do. The advantage of that is we offload design choices to numpy.