Closed hdsingh closed 4 years ago
This is a terrible idea!
The whole idea of a dask-based xarray is that it is too big to fit into memory, and maybe needs to be processed on a cluster. .compute()
brings the whole thing into memory. So it will be ok for the case that you are using only one slice, but not for the whole array. For the latter case at least, I would certainly use min/max instead of percentiles.
(note that the code should be tested anyway - I'm actually not sure whether .compute()
, the dask method, is the right thing here, instead of xarray's .values
).
I will go through the dask docs before attempting to solve this again, to get better understanding.
OK, but ask questions sooner rather than later - dask is a pretty big project, and you are looking specifically at the xarray interface, which hides/augments some of the dask.array functionality.
Use of method=tdigest
in dask.array.percentile
would require crick
and cython
as dependency. Shall it be used?
OK, we can do this for now. Maybe it will change later.
Note that we can test for the existence of crick.TDIgest and use it, if possible. We should not require it. Also, crick does not depend on cython, only numpy ( https://github.com/conda-forge/crick-feedstock/blob/master/recipe/meta.yaml#L26 ). Cython is used during building within conda-forge (or pip, if you install from source). If you do not know, I can give a brief introduction into what cython does at our next meeting.
Please test and use crick if possible, and then we can merge this.
I would like you at some point to justify the rounding to 5 places.
I have made relevant changes. Please have a look.
OK, going in when it turns green.
Now we can create plot for dask arrays.
The above code now runs without error (on clicking
PLOT
).In master branch it gives
TypeError: quantile does not work for arrays stored as dask arrays. Load the data via .compute() or .load() prior to calling this method.