Open OriolAbril opened 5 years ago
Thanks for the issue & code sample @OriolAbril
IIUC, func
needs to do this broadcasting itself; from the apply_ufunc
docstring:
func : callable
Function to call like ``func(*args, **kwargs)`` on unlabeled arrays
(``.data``) that returns an array or tuple of arrays. If multiple
arguments with non-matching dimensions are supplied, this function is
expected to vectorize (broadcast) over axes of positional arguments in
the style of NumPy universal functions [1]_ (if this is not the case,
set ``vectorize=True``). If this function returns multiple outputs, you
must set ``output_core_dims`` as well.
Then shouldn't a
in the first example keep its original shape?
Because func
receives unlabelled arrays, apply_ufunc
aligns the axes order. So if they share a dimension:
In [8]: import xarray as xr
...: import numpy as np
...:
...: a = xr.DataArray(data=np.random.normal(size=(7, 3)), dims=["dim1", "dim2"])
...: c = xr.DataArray(data=np.random.normal(size=(7, 6)), dims=["dim1", "dim4"]) # <- change here
...:
...: def func(x,y):
...: print(x.shape)
...: print(y.shape)
...: return x
...:
In [9]: xr.apply_ufunc(func, a, c)
(7, 3, 1)
(7, 1, 6)
...otherwise func
wouldn't know how to align.
Another option would be for your original example to put lengths of 1 in all axes, rather than only 'forward filling', e.g.
xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 1, 1)
# (1, 1, 5, 6) # <- change here
I think it operates without that step because functions 'in the wild' generally will handle that themselves, but that's a guess and needs someone who knows this better to weight in
For what it's worth, I agree that this behavior is a little surprising. We should probably make apply_ufunc()
explicitly broadcast arrays first.
We should probably make apply_ufunc() explicitly broadcast arrays first.
To confirm, so that we have something like this?
xr.apply_ufunc(func, a, c)
# Out
# (7, 3, 5, 6)
# (7, 3, 5, 6)
Yes, exactly.
I'm trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?
I'm trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?
Even if there were a performance cost (compared to the actual behaviour), it could be easily avoided by using all dims as input_core_dims couldn't it? IIUC, all dims should be broadcasted unless they are in input core dims, so it broadcasting could still be avoided without problem.
With NumPy arrays at least, there is no cost for broadcasting, because it can always be done with views. But even for other array types, inserting size 1 dimensions in the correct location should be basically free, and would be more helpful than what we currently do
On Wed, Jun 19, 2019 at 9:25 PM Oriol Abril notifications@github.com wrote:
I'm trying to think whether there would be any performance cost there - i.e. are there any arrays where preemptive broadcasting would be both expensive and unnecessary?
Even if there were a performance cost (compared to the actual behaviour), it could be easily avoided by using all dims as input_core_dims couldn't it? IIUC, all dims should be broadcasted unless they are in input core dims, so it broadcasting could still be avoided without problem.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3032?email_source=notifications&email_token=AAJJFVUDJUKKVSXB5DQONF3P3J2YXA5CNFSM4HZEHCZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCXX4Y#issuecomment-503675891, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJJFVTEQBAYTYROYMQNGC3P3J2YXANCNFSM4HZEHCZA .
@shoyer thanks for the clarity
@OriolAbril would you mind if we changed this issue to "apply_ufunc should preemptively broadcast"
@max-sixty Not at all, whatever is best. I actually opened the issue without being 100% it was one.
Code Sample
I am having some troubles understanding
apply_ufunc
broadcasting rules. As I had some trouble understanding the docs, I am not 100% sure it is a bug, but I am quite sure. I will try to explain why with the following really simple example.The function defined always raises an error when trying to call
apply_ufunc
, but this is intended, as the shapes have already been printed by then, and this keeps the example as simple as possible.Problem description
Here,
a
has been kind of broadcasted, but I would expect the shapes ofa
andc
to be the same as when callingxr.broadcast
, as there are no input core dims, so all dimensions are broadcasted. However:Using different input core dims does not get rid of the problem, instead I believe it shows some more issues:
Is this current behaviour what should be expected?
Output of
xr.show_versions()