Closed keflavich closed 2 years ago
This PR is pretty wrong because u.Quantity(x, unit)
could sometimes mean "Attach unit unit
to unitless x
" and sometimes mean "Convert quantity x
to unit unit
"
The actual bottleneck is here:
https://github.com/astropy/astropy/blob/17e281e/astropy/units/quantity.py#L504-L505
i.e., np.array(dask_data)
https://github.com/astropy/astropy/issues/8227#issuecomment-951431146 is the same issue
i.e.,
np.array(dask_data)
This PR is pretty wrong because
u.Quantity(x, unit)
could sometimes mean "Attach unitunit
to unitlessx
" and sometimes mean "Convert quantityx
to unitunit
"
Apart from this and (data * unit)
not having a .unit
method, does the workaround fail anyway to avoid the read?
I don't know! I think it does, but I didn't get to checking.
Well, OK, I do know that multiplying a dask array by a unit does not trigger the read, so I'm pretty sure this would work within spectral-cube. I tested by loading a spectral-cube and multiplying cube._data
(which was a dask array) by u.K
- that returns a dask array, no problem. u.Quantity(cube._data, u.K)
triggers the read.
isintance(data, u.Quantity)
would be easy enough to check for separate treatment for a true Quantity, but what if it is already a (dask_arr * unit)
...?
yes exactly. I started going down the wrapper route a few hours ago, but I think we need python's formal wrapping tools for that, and then we're getting dangerously close to trying to implement the thing Martin said was very hard.
....I don't know how I managed to flub a merge here.... but I tried to revert the original changeset and go with a wrapper approach, but it's totally WIP and nothing works yet, it's just ideas I started putting on paper
what if it is already a
(dask_arr * unit)
...?
Looks like isinstance(data._meta, u.Quantity)
would work in that case (and also allow to extract the unit), but that would indeed go some way to writing a full wrapper.
superceded by #783
It appears that
u.Quantity(data, unit)
triggers a full file read, which is disastrous as the whole point of spectral-cube is not to trigger unnecessary IO.My preference for using that construct came from some very old insight gained when performance testing code for
pyradex
. If you're adding units a lot of time in a loop,data * unit
is expensive. But, we're not expecting to add the units many times - that's not a likely bottleneck. Instead, IO is a bottleneck. It makes more sense to have:than
for our use cases; we often want to stitch units onto dask arrays, which doesn't play well with
u.Quantity
's constructor.(caveat: I haven't tested this yet, so it's possible there's some astropy version dependence)