Chunking and Dask notebooks review

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:30Z ----------------------------------------------------------------

I think we can just state 'compute' for this stage? To not to complicate ?

guillaumeeb commented on 2022-08-26T13:48:11Z ----------------------------------------------------------------

I was under the impression that load() was often used (at least on Pangeo gallery), so I wanted to be complete about the method that can be used. But this is not a strong advice.

review-notebook-app[bot] commented 2 years ago

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:30Z ----------------------------------------------------------------

What about this?

What is a Dask Distributed ? 

As shown in the figur, to use Dask Distributed, 
a user usually needs 'Client' and 'Cluster' object. 
(or a user usually needs dask distributed client and cluster) 

A Dask Distributed cluster  is made of two main components: 

a Scheduler, responsible for handling computations graph and distributing tasks to Workers. 
One or several (up to 1000s) Workers, computing individual tasks and storing results and data into distributed memory (RAM and/or worker's local disk).

_guillaumeeb commented on 2022-08-26T13:51:43Z_ ----------------------------------------------------------------

I'd stick with 'a Dask Distributed cluster', 'a Dask Distributed' alone sounds weird.

And so I'd stick also with the description of the cluster first, and then with the tools allowing to deploy a Cluster, and connect to it.

review-notebook-app[bot] commented 2 years ago

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:31Z ----------------------------------------------------------------

I do not see the point of adding 'Hadoop' here

guillaumeeb commented on 2022-08-26T13:54:29Z ----------------------------------------------------------------

Initially I just wanted to correct the form 'or ..' with ', etc.', and I just added another example of common (and some years ago popular) distributed computing infrastructure that can host Dask clusters.

But I'm happy to remove Hadoop if you think it is too much.

review-notebook-app[bot] commented 2 years ago

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:32Z ----------------------------------------------------------------

You might need to take out 'output' here to make jupyterbook build work ?

guillaumeeb commented on 2022-08-26T11:53:47Z ----------------------------------------------------------------

Crap, I really thought I did removed them. I had to go get my kids, I will look at your comments later!

guillaumeeb commented on 2022-08-26T13:54:45Z ----------------------------------------------------------------

Outputs cleared.

review-notebook-app[bot] commented 2 years ago

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:32Z ----------------------------------------------------------------

Dask tasks graph => Dask task graph

review-notebook-app[bot] commented 2 years ago

View / edit / reply to this conversation on ReviewNB

tinaok commented on 2022-08-26T11:41:33Z ----------------------------------------------------------------

into another ta. > into another tab.

guillaumeeb commented 2 years ago

Crap, I really thought I did removed them. I had to go get my kids, I will look at your comments later!

View entire conversation on ReviewNB

guillaumeeb commented 2 years ago

I was under the impression that load() was often used (at least on Pangeo gallery), so I wanted to be complete about the method that can be used. But this is not a strong advice.

View entire conversation on ReviewNB

guillaumeeb commented 2 years ago

I'd stick with 'a Dask Distributed cluster', 'a Dask Distributed' alone sounds weird.

And so I'd stick also with the description of the cluster first, and then with the tools allowing to deploy a Cluster, and connect to it.

View entire conversation on ReviewNB

guillaumeeb commented 2 years ago

But I'm happy to remove Hadoop if you think it is too much.

View entire conversation on ReviewNB

guillaumeeb commented 2 years ago

Outputs cleared.

View entire conversation on ReviewNB

annefou commented 2 years ago

Ok. What should we do with this PR? Shall we merge it?

I will set up a new repo for the clover bootcamp and I am wondering if I should start from the existing repo or after merging this PR. Thanks!.

guillaumeeb commented 2 years ago

What should we do with this PR? Shall we merge it?

I would say yes, but I'm a little biased 😄. Seriously, it needs some replies from @tinaok to see if we need to modify a few things.

pangeo-data / foss4g-2022

Chunking and Dask notebooks review #93