Closed chris-b1 closed 5 years ago
For xarray, probably the right choice is for @
to be an alias for .dot()
:
http://xarray.pydata.org/en/stable/generated/xarray.DataArray.dot.html
The broadcasting semantics of np.matmul
don't quite make sense because it broadcasts based on axis position, not name.
Would love support for PEP 465 @
notation.
Recently, @
came in handy when multiplying numpy.ndarray
with scipy.sparse
matrices. We're considering xarray
for our project and compatibility with this unified operator would be a real plus!
More specifically, I'd like to be able to do matrix multiplication between numpy ndarrays / matrices, scipy sparse matrices, and xarray DataArrays. @
seems like the most natural operator to enable this cross-package compatibility.
More specifically, I'd like to be able to do matrix multiplication between numpy ndarrays / matrices, scipy sparse matrices, and xarray DataArrays.
I'm intrigued, but how would this work? data_array + numpy_array
yields a result with well-defined labels as long as numpy_array
broadcasts against data_array.data
, but data_array @ numpy_array
does not if numpy_array
has 2 or more dimensions.
I guess we could prohibit @
with non-vector other arguments, but I still am concerned that the suggested meaning of @
per PEP 465 and numpy depends on the order of array dimensions. Basically, the last dimension of the left-hand-side argument should be matched against the second-to-last (or last, for 1D) dimension of the right-hand-side for the tensor contraction. In xarray terms, we could match the last dimension of the left-hand-side with any matching dimensions (by name) of the right-hand-side, but it's still messily inconsistent with other xarray operations, which are generally agnostic to to dimension order.
It also gets messy on Dataset
objects, because the order of dimensions now becomes a bit more ambiguous: there's the order of dimensions on the Dataset
itself, and the order on each DataArray
in the dataset.
For these reasons, I'm leaning towards thinking that @
should be defined differently for xarray, and work like tensordot over all matching dimensions.
First let me say, I know python, but I don't know linear algebra (I rely on @kkloste for algebra). I'm also new to xarray
and recently used it for the first time to represent a hetnet (network with multiple node and relationship types) as a xarray.DataSet
where each DataArray
is an adjacency matrix (0
or 1
for whether an edge exists) for a specific edge type. I was drawn to xarray
because it allows us to:
The operations that we're using for our project are dot-product multiplying 2D arrays by 2D arrays and 1D arrays by 2D arrays. Currently, our arrays are numpy.ndarray
s, but we may switch some of our 2D arrays to scipy.sparse
matrices.
I'm intrigued, but how would this work? data_array + numpy_array yields a result with well-defined labels as long as numpy_array broadcasts against data_array.data, but data_array @ numpy_array does not if numpy_array has 2 or more dimensions.
My intuition was that we use @
on a DataArray in cases where DataArray.values @ numpy.ndarray
or numpy.ndarray @ DataArray.values
would work. In these situations, the user would be responsible for ensuring numpy.ndarray
had the correct coordinates and dimensions. We're also interested in DataArray.values @ scipy.sparse
.
However, it appears that xarray
may do some inference based aligning dimensions/coordinates... and that I need to understand this process a bit more. Sorry if this reply doesn't help you move forward with this issue. I hopefully will be able to be more helpful as I become more familiar with xarray
.
It also gets messy on Dataset objects
For clarity, I wasn't thinking of using @
on Datasets.
My intuition was that we use @ on a DataArray in cases where DataArray.values @ numpy.ndarray or numpy.ndarray @ DataArray.values would work.
Suppose data_array
is a DataArray
with dimensions ['x', 'y']
and numpy_array
is a numpy.ndarray with a compatible shape. What should data_array @ numpy_array
look like? The first dimension should be labeled x
, but the second dimension doesn't have a name, so we'd need to come up with one somehow (every dimension in an DataArray
must have a name).
However, it appears that xarray may do some inference based aligning dimensions/coordinates... and that I need to understand this process a bit more.
Indeed, see http://xarray.pydata.org/en/stable/computation.html#broadcasting-by-dimension-name
How about just keeping the current behavior? Currently a @ b just returns a new numpy array if either a or b is no xr.DataArray. This makes perfectly sense to me.
If both arrays are xr.DataArrays, I get an error which was rather unexpected. Here, xarray could simply stick to xr.DataArray.dot().
Yes, we could definitely make @
between two xarray objects equivalent to xarray.dot()
.
Closed by #2987
xref https://github.com/pandas-dev/pandas/issues/10259
Presumably deferring to the semantics of
np.matmul
- not sure if that API is stable yet?