Closed guillaumeeb closed 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:30Z ----------------------------------------------------------------
I think we can just state 'compute' for this stage? To not to complicate ?
guillaumeeb commented on 2022-08-26T13:48:11Z ----------------------------------------------------------------
I was under the impression that load() was often used (at least on Pangeo gallery), so I wanted to be complete about the method that can be used. But this is not a strong advice.
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:30Z ----------------------------------------------------------------
What about this?
What is a Dask Distributed ? As shown in the figur, to use Dask Distributed, a user usually needs 'Client' and 'Cluster' object. (or a user usually needs dask distributed client and cluster) A Dask Distributed cluster is made of two main components: a Scheduler, responsible for handling computations graph and distributing tasks to Workers. One or several (up to 1000s) Workers, computing individual tasks and storing results and data into distributed memory (RAM and/or worker's local disk).
I'd stick with 'a Dask Distributed cluster', 'a Dask Distributed' alone sounds weird.
And so I'd stick also with the description of the cluster first, and then with the tools allowing to deploy a Cluster, and connect to it.
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:31Z ----------------------------------------------------------------
I do not see the point of adding 'Hadoop' here
guillaumeeb commented on 2022-08-26T13:54:29Z ----------------------------------------------------------------
Initially I just wanted to correct the form 'or ..' with ', etc.', and I just added another example of common (and some years ago popular) distributed computing infrastructure that can host Dask clusters.
But I'm happy to remove Hadoop if you think it is too much.
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:32Z ----------------------------------------------------------------
You might need to take out 'output' here to make jupyterbook build work ?
guillaumeeb commented on 2022-08-26T11:53:47Z ----------------------------------------------------------------
Crap, I really thought I did removed them. I had to go get my kids, I will look at your comments later!
guillaumeeb commented on 2022-08-26T13:54:45Z ----------------------------------------------------------------
Outputs cleared.
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:32Z ----------------------------------------------------------------
Dask tasks graph => Dask task graph
View / edit / reply to this conversation on ReviewNB
tinaok commented on 2022-08-26T11:41:33Z ----------------------------------------------------------------
into another ta. > into another tab.
Crap, I really thought I did removed them. I had to go get my kids, I will look at your comments later!
View entire conversation on ReviewNB
I was under the impression that load() was often used (at least on Pangeo gallery), so I wanted to be complete about the method that can be used. But this is not a strong advice.
View entire conversation on ReviewNB
I'd stick with 'a Dask Distributed cluster', 'a Dask Distributed' alone sounds weird.
And so I'd stick also with the description of the cluster first, and then with the tools allowing to deploy a Cluster, and connect to it.
View entire conversation on ReviewNB
Initially I just wanted to correct the form 'or ..' with ', etc.', and I just added another example of common (and some years ago popular) distributed computing infrastructure that can host Dask clusters.
But I'm happy to remove Hadoop if you think it is too much.
View entire conversation on ReviewNB
Ok. What should we do with this PR? Shall we merge it?
I will set up a new repo for the clover bootcamp and I am wondering if I should start from the existing repo or after merging this PR. Thanks!.
What should we do with this PR? Shall we merge it?
I would say yes, but I'm a little biased 😄. Seriously, it needs some replies from @tinaok to see if we need to modify a few things.
Here are the corrections corresponding to the review I made on https://github.com/pangeo-data/foss4g-2022/pull/87.