jbusecke / easy_ipcc

We are trying to reproduce IPCC analysis plots from raw data in the cloud
Apache License 2.0
9 stars 1 forks source link

Efficient loading of many datasets #2

Open jbusecke opened 2 years ago

jbusecke commented 2 years ago

When I did a for loop + .load() on my scipy presentation I noticed that there are discernible gaps in the dask stream which indicate to me that the graph setup is happening for each datasets separately.

I wonder if datatree could just 'append' each computation to a graph and then execute that large graph to be more efficient?

Related to https://github.com/xarray-contrib/datatree/issues/97

jbusecke commented 10 months ago

Relevant comment here https://discourse.pangeo.io/t/collecting-problematic-workloads/3683/7