Closed sneakers-the-rat closed 3 weeks ago
Thanks for these @sneakers-the-rat!
I agreed with all of the changes you suggested, and made those in the text. These suggestions definitely cleared up a few things, thanks! Just to answer a few questions (even though I think you got the gist of what was going on):
L66-68: list is a bit awkward, maybe something like "map them across available resources, from a laptop to a distributed HPC cluster." - if i'm reading your purpose there correctly re: emphasizing how dask lets you scale from small compute to big compute?
That's correct. I just added "cloud" to your suggestion in order to indicate that it can work in the cloud too.
L79: unclear to me if this is just showing another similar package, or if this is the one that you build xesn on top of as mentioned in L62?
Yeah it was just to list another implementation, so I went with your wording suggestion. Thanks!
Thanks for all of the suggestions regarding the Fig 1 caption, I'll add some answers here and then copy/paste the new text below (which I'll merge once you agree with the changes).
Here's the new figure and caption
On the CPU resource, a GCP `c2-standard-60` instance, the default
`dask.distributed.LocalCluster` has 6 workers, each with 5 threads.
On the GPU resource, a GCP `a2-highgpu-8g` instance, the default
`dask_cuda.LocalCUDACluster` has 8 workers, each with 1 thread.
FYI @sneakers-the-rat the changes I made to the text can be seen in #80. The paper draft workflow is there, so a pdf can be found, e.g. here
They are just there to provide the user some guidance on how much memory they might need.
ah! yes that clears things up :) if they're just to give a sense of what parameters influence memory usage and to give me an estimate, that's very helpful! I might even suggest moving that to the main text of the paper or even to the docs - it's rare to get something like that for a package, for it to let me know this accurately how many resources it requires!
I changed this figure so that the measured values are just dots (CPU) or diamonds (GPU) and the derived lines are solid lines that lay on top.
Getting there! I truly don't mean to be a stickler on this, and you can feel free to disagree, but now the different plotting styles of left and right panel make it harder to parse for me - dot shapes are in general less salient markers of difference than color or line style, but the difference between GPU and CPU is arguably the most important contrast. Asking the question to myself "what is it that the gray/black estimate lines are doing for the figure," and I think they don't add a lot to the figure per se as much as I think they would be helpful as a heuristic for potential users. Their semantic content in the figure I think is to demonstrate that the empirical heuristic formula for memory use matches the measured samples, but since the parameters are empirical (rather than derived from something like {dtype_scalar_size}*{array_shape}*...
where showing close match between theory and measurement is informative, I think they might serve you better outside of the figure entirely. If you take them of the figure and move them to the main text, then I think you might simplify the figure, caption, and also give a clearer indication to the reader that this formula might be useful to them rather than them thinking of it as a result that they are to read but not necessarily use :) (it is a result, but hopefully my meaning is clear).
I modified the dash frequency
the legend on the left is definitely clearer now! thanks! I am neutral on the dot shapes, up to you whether you like them or not :).
It's a fairly commonly used term, but let me know if you'd like more description there
If this is a term of art that you would expect your readers to know, that's good enough for me. (I'm also one of these newfangled anti-ivory-tower types who thinks it's not only fine but a favor to your reader to shortcut them to wikipedia articles for basic concepts they might need a refresher on, but that's definitely not for everyone so i only mention it as an aside, the added reference is great and i'm sure will be a meaningful indicator of what you mean to your readership!)
The problem size is most definitely fixed. ... "more or less" was referring to the fact that the way dask interacts with the problem can be less intuitive
I know this tension well. I think taking it out lets me see through to what you were saying more easily without getting distracted.
Unfortunately I don't have enough GCP allocation right now to afford additional tests, even though you're right, this would show off the code more.
that is totally fine. So fine that if "ran out of compute credits" was in a methods section as a limitation i wouldn't bat an eye. As a curiosity question, have you profiled to see if my hunch was right? that once the code starts running it finishes extremely quickly, and most of that time is spent in cuda compilation? or is there some other reason it's flat?
rest of the text looks great!
Hi @sneakers-the-rat thanks for all of the comments and feedback! Here's my reply for the last remaining questions/issues.
On Figure 1, I really appreciate your input, but I respectfully disagree. I think that having the black and gray lines in the plot gives the user an intuitive understanding for how well they can expect the equations to map to their application. I could make the two plots similar by deriving comparable curves for walltime, but I expect this to be less machine independent than memory.
I am having a change of heart, and will flip the colors to indicate CPU vs GPU, and the markers/linestyle to indicate system state size. But I do strongly prefer having the gray/black lines in the plot. Here's the latest version.
If this is a term of art that you would expect your readers to know, that's good enough for me.
I went in and added a more intuitive description, and please note that my previous edit included a citation to Amdahl's law (still there)
We then fix the problem size such that $N_r*N_g = 16,000$, so that
the timing results reflect strong scaling.
That is, the results show how the code performs with increasing resources on a fixed problem
size, which in theory correspond to Amdahl's Law [@amdahl_1967].
As a curiosity question, have you profiled to see if my hunch was right? that once the code starts running it finishes extremely quickly, and most of that time is spent in cuda compilation? or is there some other reason it's flat?
I think that having the black and gray lines in the plot gives the user an intuitive understanding for how well they can expect the equations to map to their application.
Fair enough! It is your figure :). I like the swap of the colors/shapes, that makes more sense to me. Also putting the dots on top of the lines helps a lot.
What about using solid/dashed lines instead of black/gray? They're visually distinct enough that it wouldnt get confused with the solid/dotted on the left. Black/gray is just a bit hard to tell apart.
Also since the gray lines follow the CPU, and the black follows GPU, what do you think about replacing the legend labels for those lines with some readable shorthand for the equation like "calculated CPU memory" (or whatever you think is accurate, just an example) and then putting the equation in the caption? The equation in the legend already needs someone to read the caption to know what the terms are, so might as well give the eqs a name so people get a semantic handle for them?
Edit: one more thing - it's obvious if you think about it, but it may be worth saying in the caption that the dots/triangles overlap on the right plot but there are still two curves there. On the left the dashed line makes it clear there are two GPU samples, but on the right it just looks like one when I am zoomed out to full paper view. Up to you, I think it could be inferred from the legend, im just a fan of captions that anticipate visual confusion.
I could make the two plots similar by deriving comparable curves for walltime, but I expect this to be less machine independent than memory.
Agreed. This would probably be too variable to be useful.
I went in and added a more intuitive description, and please note that my previous edit included a citation to Amdahl's law
This looks great. I saw and went and did a bit of reading yesterday, the power of a good reference ❤️.
Hi @sneakers-the-rat, thank you for all of these comments!
What about using solid/dashed lines instead of black/gray? They're visually distinct enough that it wouldnt get confused with the solid/dotted on the left. Black/gray is just a bit hard to tell apart.
I appreciate the suggestion, but I will keep the lines solid, since the difference between dashed and solid lines indicates the difference between the two system sizes.
Also since the gray lines follow the CPU, and the black follows GPU, what do you think about replacing the legend labels for those lines with some readable shorthand for the equation like "calculated CPU memory" (or whatever you think is accurate, just an example) and then putting the equation in the caption?
Thanks for the suggestion! I would prefer to keep it how it is, with the equation clearly stated in the legend. I did however modify the following phrase in the caption to be more specific about which line goes to CPU and which to GPU:
"The gray and black lines indicate the general trend in memory usage for the CPU and GPU simulations, respectively.
Also, I added subtitles to the legend so that it's clear which line goes with CPUs and which goes with GPUs.
it may be worth saying in the caption that the dots/triangles overlap on the right plot but there are still two curves there
Thanks, I added a white markeredgeborder to the plot to make this more apparent.
Here is the latest figure and caption. I hope we have converged, but let me know if you have additional comments.
great, looks good to me :)
Thanks @sneakers-the-rat!
Part of: https://github.com/openjournals/joss-reviews/issues/7286
Some very small proofreading changes from reading the paper. Overall it's great, just some suggestions that might improve the strength of the paper. The minor suggestions are up to you, take them or leave them, but I would like some clarification on the medium/general suggestions/questions for the purposes of making the paper as strong as it can be :)
Minor/grammatical suggestions:
xesn
on top of as mentioned in L62? If it's the former, the wording is a bit awkward - "finally" in a list of two - could be something like "Another implementation is that of Arcomano et. al (2020, 2022, 2023), available at..." If it's the latter, it would be nice to make that more specific :)Medium questions/suggestions:
a
andb
constants were empirically derived - would love to see some more description there.LocalCluster
configuration end up meaning for your resources?General suggestions:
scaling
directory, and I think a permalink to that would be great for readers who might not have the repo in front of them/if you choose to change repo structure in the future.Nr
to show how large the system can scale on GPU. That would be for the purposes of showing off your work, so not required, just thought i'd let you know that's what i assume when i'm reading it and it may make for a stronger conclusion.