Open ghiggi opened 2 years ago
Satpy does not support numpy arrays or DataArrays backed by numpy arrays. It supports DataArrays backed by dask arrays only. If you'd like to achieve an almost-equivalent operation to what you are doing use .persist()
instead of .compute()
.
It would not make sense in https://github.com/pytroll/satpy/blob/367d51d849fa117b1b2241d883ffc61286ee31ba/satpy/scene.py#L756 to check that the value
is a xr.DataArray
with da.Array
?
Like adding the following line of code:
if not isinstance(value.data, da.Array):
value.data = da.from_array(value.data) `
For dask newbies and xarray users, the compute
method is better known than persist
...
Sure. Yes. It could be added. Is it there now? No. Was it something we were concerned about when we were building Satpy originally? Not at the time. Would I accept a pull request to work around this? YES!
However, I'm wondering if there are other options besides using da.from_array
that we should consider? An error message? Or is da.from_array
"good enough". Any thoughts @pytroll/core?
For dask newbies and xarray users, the compute method is better known than persist ...
I don't disagree with that and I wasn't trying to say "you should have known about this". I was saying as a workaround for the issue you have brought up, you could use persist instead of compute.
Ah! Exploiting the possible discussion ... another possible small improvement I noticed to getter/setters method is to do a small change to Scene.__getitem__
method or DatasetDict.__getitem__
method so that the returned DataArray
has a name
equivalent to the key
(if key
is a string) or key.name
(if key
is a DataID).
I may be misunderstanding what you're going for, but the reason DataIDs are used is that string names alone are not enough to uniquely describe products in Satpy (ex. Band 1 at 250m spatial resolution, 500m, or 1km; each one is a separate DataID). Additionally, modifying inputs at __setitem__
/__getitem__
are going to make debugging problems very difficult. It would also require copying .attrs
as to not modify the original which will be another source of confusing to users.
I'm starting to lean towards either:
__setitem__
value is not a DataArray[dask] object. This could optionally suggest a utility function for converting to DataArray[dask].
Describe the bug /modifiers/angles.py functions currently fails if the input
xr.DataArray
data are innumpy.ndarray
instead of adask.Array
. This because in many/mofidifers/angles.py
functions, chunks are often retrieved from the inputs with:data.chunks
/data_arr.chunks
/data_arr.data.chunks
.If the low-level direct call of this function is fine to be restricted to expect only
dask.Array
(missing checks ...), then conversion todask.Array
should be at least imposed at high-level when i.e. assigning aDataArray
to aScene
, to ensure that downstream computation (i.e. composites generations) do not fail whenScene
DataArrays
are in memory. See a classical example failing here below:To Reproduce
Expected behavior Generation of composites should work also if the
Scene
DataArray
s have data in memoryActual results
AttributeError: 'numpy.ndarray' object has no attribute 'chunks'
Environment Info: