Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @1313e, @csadorf it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.16 s (451.6 files/s, 64757.4 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 44 1620 1510 4475
Cython 5 385 80 1032
YAML 9 72 10 258
reStructuredText 8 217 246 153
Bourne Shell 2 28 85 147
TOML 1 5 2 38
DOS Batch 1 8 1 26
Markdown 1 5 0 17
diff 1 3 13 13
make 1 4 7 9
-------------------------------------------------------------------------------
SUM: 73 2347 1954 6168
-------------------------------------------------------------------------------
Statistical information for the repository '31e8fc2678107bcd087fcbc0' was
gathered on 2021/06/25.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Clemens Giuliani 2 562 49 4.34
Dion Häfner 57 8326 2750 78.69
Filippo Vicentini 34 1961 427 16.97
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Clemens Giuliani 247 44.0 5.3 4.45
Dion Häfner 6576 79.0 3.5 9.76
Filippo Vicentini 782 39.9 2.0 6.52
PDF failed to compile for issue #3419 with the following error:
Can't find any papers to compile :-(
@whedon generate pdf from branch joss-paper
Attempting PDF compilation from custom branch joss-paper. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
:wave: @1313e, please update us on how your review is going (this is an automated reminder).
:wave: @csadorf, please update us on how your review is going (this is an automated reminder).
Yup, I will be on it next week.
@dionhaefner I have a few questions regarding authorship and the paper itself. I am posting them here such that it is easier to track for others and to allow for feedback.
I wanted to ask why only one of the two of you is stated as being an author of the software in the setup.py
file?
Shouldn't that be both?
The order of authors on the paper is not the same as the order of the contributors (yes, I can see how made more contributions in terms of lines). Given how important first author usually is, I want to make sure it is done correctly. So, can you confirm that the order is correct?
There is a third person named inailuig
, who despite only having 2 contributions, does seem to have made a proper impact on the package.
Is there a reason this person is not included as an author on the paper?
The paper itself seems to be a bit too long in my opinion. Please correct me if I am wrong @kellyrowland, but the point of a JOSS paper is to basically give a summary or abstract to your software, just to make software easily citable. The documentation should actually provide all the details and information required for using the package, whereas the package more gives a general description to get others interested in it. In this particular case, there are quite a few code snippets with explanations in the paper that are more well-suited for the online documentation. Furthermore, I also cannot find these code snippets with explanations in the online documentation, so it might be a good idea to move them there.
Finally, the paper mentions in the Outlook that there are quite a few more functionalities that must be implemented for the package. Isn't it maybe a bit too early to already publish it then, even if it is already useful to others (given the number of stars, I can see that it is being used)?
👋 @csadorf, please update us on how your review is going (this is an automated reminder).
I should be able to complete it by the end of the week.
Thank you very much for your comments @1313e. Answers inline, and I will address your issues on the mpi4jax
repo soon.
I wanted to ask why only one of the two of you is stated as being an author of the software in the setup.py file? Shouldn't that be both?
The order of authors on the paper is not the same as the order of the contributors (yes, I can see how made more contributions in terms of lines). Given how important first author usually is, I want to make sure it is done correctly. So, can you confirm that the order is correct?
We flipped a coin for first authorship of the paper, as also indicated in the author list. We feel that both have contributed equally. So the order is correct, yes.
In case of the README and setup.py
, we think that full attribution and first-authorship as not nearly as important as on an academic paper. The README just lists contributors in chronological order. But I actually think this list is redundant, as it's already shown in the GitHub UI in the sidebar. So I will remove the contributors section from the README.
In case of setup.py
, this is often just the original author or main contact for questions. In my experience it is not meant to give an exhaustive list.
There is a third person named inailuig, who despite only having 2 contributions, does seem to have made a proper impact on the package. Is there a reason this person is not included as an author on the paper?
We value every contribution, but we feel that the bar for academic authorship should be a little bit higher than a one-time code contribution. If I interpret the JOSS guidelines correctly, authorship questions should be settled through "common sense", and we do feel that the author list is an accurate reflection of the work that went into the package.
The paper itself seems to be a bit too long in my opinion. Please correct me if I am wrong @kellyrowland, but the point of a JOSS paper is to basically give a summary or abstract to your software, just to make software easily citable. The documentation should actually provide all the details and information required for using the package, whereas the package more gives a general description to get others interested in it. In this particular case, there are quite a few code snippets with explanations in the paper that are more well-suited for the online documentation. Furthermore, I also cannot find these code snippets with explanations in the online documentation, so it might be a good idea to move them there.
I take it you refer to the second half of the "Implementation" section? We use these code snippets as a device to explain why we need to go out of our way with token management, but if you think they are distracting we could remove them. There are similar examples in the corresponding part of the docs.
If you mean the shallow water application, that one is also part of the docs. Same applies here, if you think the code is distracting we can remove it and refer to the docs, and just include the benchmark timings and the figure.
Finally, the paper mentions in the Outlook that there are quite a few more functionalities that must be implemented for the package. Isn't it maybe a bit too early to already publish it then, even if it is already useful to others (given the number of stars, I can see that it is being used)?
This seems to be a misunderstanding; there is nothing that must be implemented for the package to be useful. There are 2 major use cases for mpi4jax
:
Use case 1 is mature and fully supported, and this is what we describe in the paper (and, as you noted, is already being used in the wild). Use case 2 is not really supported, but might be in the future (but it could also turn out to be technically impossible). I think it is customary to extrapolate a bit into the future in the last section, also to inspire potential contributors. If you prefer that we stick to what is already supported we can remove it.
@dionhaefner
We value every contribution, but we feel that the bar for academic authorship should be a little bit higher than a one-time code contribution. If I interpret the JOSS guidelines correctly, authorship questions should be settled through "common sense", and we do feel that the author list is an accurate reflection of the work that went into the package.
Okay, good. I just wanted to make sure they weren't simply forgotten about.
I take it you refer to the second half of the "Implementation" section? We use these code snippets as a device to explain why we need to go out of our way with token management, but if you think they are distracting we could remove them. There are similar examples in the corresponding part of the docs.
Yes, I am referring to that part. I am not saying it isn't useful, because it most definitely is. It is just a lot of information that I feel is a bit too extensive for the paper itself. But again, if the editor is happy with it, then I am as well.
This seems to be a misunderstanding; there is nothing that must be implemented for the package to be useful. There are 2 major use cases for
mpi4jax
:
- To use MPI primitives in JITed code without overhead on CPU and GPU.
- To do distributed autograd, where you differentiate through communication calls.
Use case 1 is mature and fully supported, and this is what we describe in the paper (and, as you noted, is already being used in the wild). Use case 2 is not really supported, but might be in the future (but it could also turn out to be technically impossible). I think it is customary to extrapolate a bit into the future in the last section, also to inspire potential contributors. If you prefer that we stick to what is already supported we can remove it.
Please note that I am not saying that the package isn't useful or feature complete. On the contrary actually. I am more pointing out that more functionality seems to be planned for the future, as you mention as well, and to me it seemed to be quite a bit. I was therefore simply wondering whether it would be a good idea to publish now instead of later, when the paper can also introduce those added functionalities.
Hope my comments didn't feel too direct. I actually really like the package and made me interested in JAX and mpi4jax as well. :)
Thanks for the clarification! Sounds like we are on the same page. I think your comments were perfectly appropriate.
I was therefore simply wondering whether it would be a good idea to publish now instead of later, when the paper can also introduce those added functionalities.
We have considered that, but given that we don't know whether autgrad-through-MPI will ever be part of mpi4jax
(either because we don't have the resources or because of fundamental technical obstacles, perhaps both), we decided that it was a good time to publish now.
We flipped a coin for first authorship of the paper, as also indicated in the author list. We feel that both have contributed equally. So the order is correct, yes.
Bt the way, @dionhaefner , I noticed just now, but in my field it is customary to put the asterisk of "both authors contributed equally" on both authors (not only the second). So maybe we can address that?
Finally, the paper mentions in the Outlook that there are quite a few more functionalities that must be implemented for the package. Isn't it maybe a bit too early to already publish it then, even if it is already useful to others (given the number of stars, I can see that it is being used)?
I am writing an article for netket to be submitted in a few months and i would like to cite mpi4jax
because NetKet could not work without it. I believe mpi4jax
is now mature and battle tested by netket users enough to warrant a paper.
(In fact, mpi4jax
was born as a subpackage in netket
and was soon moved out once i started discussing with @dionhaefner because it looked like it could gain more traction.)
About not being complete: me and dion see different value in mpi4jax
.
He sees part 1, the collective communication, as the most important part, while i am mostly interested in part 2, that is autodiff through mpi calls, which is what I need in NetKet.
As he said, part 1 is feature-complete (bar a few exotic mpi calls, which i don't think warrant stalling the article) while part 2 is not. Many things do work, even if we don't mention them in the article (mainly collective communication), but some (send/recv) do with very peculiar limitations (like only in reverse mode, or only in forward mode) due to limitations in jax that even I am not 100% sure if they can be overcome or not.
For that reason we agreed to focus the JOSS paper on part 1 (MPI), and mention part 2 (AD) as something that can be fully fleshed out in the future, even if we do already support AD in the more common use-cases.
@whedon generate pdf from branch joss-paper
Attempting PDF compilation from custom branch joss-paper. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Bt the way, @dionhaefner , I noticed just now, but in my field it is customary to put the asterisk of "both authors contributed equally" on both authors (not only the second). So maybe we can address that?
We had that in the first draft, but at some point it disappeared, so I assumed you removed it :) Anyhow, it's back.
@kellyrowland I did not realize that one of the authors is affiliated with institution I am employed by: École Polytechnique Fédérale de Lausanne (EPFL). According to the COI Policy, this constitutes a COI. I have not personally interacted or collaborated with Filippo Vicentini or their host laboratory. I am going to continue with the review process, but will wait on confirmation on whether the COI is waived or not before submitting any comments.
Hi @csadorf - thanks for bringing this up. I will try to find another reviewer in the interest of conservatism with respect to the COI policy. Having said that, if we're unable to find another reviewer, it does sound like there is enough separation in this case that we could waive it.
Hello @adam-m-jcbs @gonsie 👋 would either of you be available and willing to review this paper for JOSS?
Sorry, I’m not available to provide a review.
No worries, thanks for the response!
Hello @mikaem @Himscipy - would either of you be available and willing to review this for JOSS? It looks like both of you may have ongoing JOSS reviews, so no worries about declining this.
👋🏼 @kellyrowland, I am available to review this JOSS submission. Thank you for reaching out.
Wonderful, thank you! I will assign you as a reviewer on the issue shortly.
@whedon remove @csadorf as reviewer
OK, @csadorf is no longer a reviewer
Hi @csadorf - I've removed you as a reviewer here to avoid the COI. Thank you again for bringing that to my attention.
@whedon add @Himscipy as reviewer
OK, @Himscipy is now a reviewer
Hi @csadorf - I've removed you as a reviewer here to avoid the COI. Thank you again for bringing that to my attention.
Got it. Happy to review other submissions in the future!
@kellyrowland I am done with my review and happy for this paper+software to be accepted. I think it is a great addition to the open-source community. :)
Thanks for the review @1313e!
Hi @Himscipy just checking in - please let me know about the status of your review or if you need to step away from it.
Hi @kellyrowland , Thanks for checking. I plan to finish it over the weekend, if that okay.
@Himscipy that sounds great, thank you!
Hi @dionhaefner, Following are some comments regarding the submission :
Thanks for the review @Himscipy.
- The software tools wraps mpi4py library with jax, is a very interesting idea and tool such as this will be really beneficial for scientific community. With that said, I find that the paper doesn't describe findings the finding from the second example listed in the introduction section of the paper(NetKet).
I'm not sure I follow. What kind of extra information would you expect? I'm sure @PhilipVinc would be happy to provide.
- The paper does describe the scalability for the a 2D shallow water on ~6M grid, but doesn't clearly mention about the workload handled by each worker, and the mesh partition strategy a user can directly leverage from the tool. For the general application perspective, it's important to clearly describe this in the paper.
I have no problem with mentioning the mesh partitioning and grid points per worker. I'll add it to the manuscript. I'm not sure though what you mean by "a user can directly leverage from the tool". All interaction with communicators is handled on the user side via mpi4py, so there's nothing we add or take away.
- In addition if you also provide a 3D example case and the working mesh partitioning strategies, the usability of the tool can increase multifold. I am not aware of existing JAX based Numerical scientific solvers, if there are such existing codes, a impactful use case would be to show how your software tool can integrate with it seamlessly. The paper does mention about a Jax based code "Veros" but did not reports the scalability and performance. In addition the "Veros " documentation does list mpi4Jax for distributed computing.
I agree that a 3D example would be great, but it's not exactly something I have "lying around", and since JAX is relatively new there's not a huge ecosystem to fall back to. If you have a simple 3D toy example I would be happy to see if we can make it work - that would definitely make a good addition to the docs.
I am the maintainer of Veros, and yes we do use mpi4jax
to scale to small clusters and dozens of GPUs. I could add a link to the Veros benchmarks page to the paper?
- As the work talks about communication and distributed computing, for HPC community it is important to have code profiling and timing results from the profiler to demonstrate the applicability of the tool.
We could try and quantify the overhead incurred by using mpi4jax by benchmarking it against mpi4py. Would that sound like a useful addition? Otherwise you'll need to be more specific what we should profile, since all actual communication happens inside mpi4py / MPI.
- While the reported benchmarks are on small machine, I recommend to test the wrapper performance on large clusters as well and report their results and findings in the paper.
We benchmark Veros for up to 256 cores (see link above). Would it be sufficient to link to that, given that we also measure overhead vs mpi4py (4)? (Besides that, I don't have access to a large cluster.)
- The "zero copy " mentioned in the paper title is never explained or elaborated in the paper. My recommendation would be to provide explanation about it in the paper.
Sure, I'll add a sentence.
If I can add something...
- As the work talks about communication and distributed computing, for HPC community it is important to have code profiling and timing results from the profiler to demonstrate the applicability of the tool.
While I do agree that benchmarks in general are a usefull tool, i don't think they are really relevant here.
mpi4jax
is literally just a wrapper+autodiff rules, as you recognise yourself in your question (5).
It just turns out that writing a wrapper for jax is not super-simple and we would like to be able to cite our work for it.
Properly benchmarking mpi4jax
will, in the end, tell us what is the cost of a function call in XLA and how performant is the particular MPI implementation of the (super)computer used in the benchmark. Moreover, in any reasonably-sized workload this cost will be negligible compared to the MPI operation itself.
I don't think that this information is particularly useful. (Addendum: It is also a moving target: customcall runtime cost has been halved in recent versions of jaxlib/XLA and depends on the library version the user has installed).
FWIW, I did some quick benchmarks and found that mpi4jax
incurs an overhead of about 800ns (so way beyond noise level in any application I can think of).
I think that could be a nice result to add to the paper, to show that we actually achieve zero-copy communications with minimal overhead.
I'm not sure I follow. What kind of extra information would you expect? I'm sure @PhilipVinc would be happy to provide.
I don't expect extra information, here I mean that in the introduction section you elucidate the NetKet example, but in the result section you only show the results of the shallow water equation. Providing the details of all the examples you list in the introduction section, will provide completeness to the paper. In addition, I didn't find any details on the NetKet in the documentation page also.
I have no problem with mentioning the mesh partitioning and grid points per worker. I'll add it to the manuscript. I'm not sure though what you mean by "a user can directly leverage from the tool". All interaction with communicators is handled on the user side via mpi4py, so there's nothing we add or take away.
I think that could be a nice result to add to the paper, to show that we actually achieve zero-copy communications with minimal overhead.
Please add that in the paper, this will further strengthen the tool usability.
I don't expect extra information, here I mean that in the introduction section you elucidate the NetKet example, but in the result section you only show the results of the shallow water equation. Providing the details of all the examples you list in the introduction section, will provide completeness to the paper. In addition, I didn't find any details on the NetKet in the documentation page also.
Sounds like there is a misunderstanding. NetKet and Veros are users of mpi4jax
and not directly connected to the project. We just include a brief description of them in the paper to show what is possible with the library.
The shallow water example code has nothing to do with either of them, it's just that - an example.
I will update the paper with the other suggestions ASAP.
@kellyrowland I am done with my review and I wish good luck to the authors for this paper+software and hope that they continue developing the tool further. Please update the paper and I will be happy to take another look if needed. Thank you.
@whedon generate pdf from branch joss-paper
Attempting PDF compilation from custom branch joss-paper. Reticulating splines etc...
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I'm done with incorporating all the suggestions. Thanks again for the review @Himscipy!
Thanks @Himscipy !
@dionhaefner can you please create an archive for this release version (on Zenodo, figshare, or elsewhere) and then post the archive DOI here?
Submitting author: @dionhaefner (Dion Häfner) Repository: https://github.com/mpi4jax/mpi4jax Version: v0.3.2 Editor: @kellyrowland Reviewers: @1313e, @Himscipy Archive: 10.5281/zenodo.5337529
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@1313e & @Himscipy, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kellyrowland know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @1313e
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @Himscipy
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper