[JOSS] Paper feedback - Githubissues

ianfhunter commented 1 year ago

Overall, the paper is well written, I appreciate being able to understand it without too much prior knowledge. I have a few comments though to be addressed:

line 27: "~10⁶+" - The claim is unclear. This should be either "~10⁶" OR "10⁶+". The training routine can scale to approximately that many points, or it can scale to more than that amount?
line 50: You should probably cite lmdb
line 54: the acronym UQ is not expanded.
You make claims that AMPTorch can support higher amounts of datapoints than the base AMP and other existing codes (lines 37-38). It would be helpful to name some other examples than AMP and state their limits for comparison, to highlight the impact of this work versus the SOTA.

ajmedford commented 1 year ago

@ianfhunter Thanks for the feedback! We have made some changes, and address these concerns point by point below. Let us know if you have further recommendations.

Comment: line 27: "~106+" - The claim is unclear. This should be either "~106" OR "106+". The training routine can scale to approximately that many points, or it can scale to more than that amount?

Solution: Removed "+", so it should appear as "~10^6". In theory it is possible to use more points, but we have not tested beyond a few million so this seems like the correct way to write it.

Comment: line 50: You should probably cite lmdb

Solution: LMDB package didn't have a specific doi or article to reference to so we added the author and another author who contributed (http://www.lmdb.tech/doc/). Hope this will suffice.

Comment: line 54: the acronym UQ is not expanded.

Reply: It is expanded in Line #71 "... statistically-rigorous uncertainty quantification (UQ) during ..."

Comment: You make claims that AMPTorch can support higher amounts of datapoints than the base AMP and other existing codes (lines 37-38). It would be helpful to name some other examples than AMP and state their limits for comparison, to highlight the impact of this work versus the SOTA.

Solution: Added Atom-centered symmetry function as a comparison for the number of fingerprinting dimensions scale with the number of chemical elements in the training data. Due to the word limit in JOSS, it's a little bit hard for us to expand on this point in greater detail.

ml-evs commented 1 year ago

Just so you have it all in once place, I'll also add my comments on the paper here. The paper is well-written and the package is well-motivated, so my suggestions are mostly minor. (Feel free to tick the boxes yourself)

[x] 38: boiler plate -> boilerplate
[x] 47: SingleN -> SingleNN
[x] 50: Btree-based -> B-tree-based
[x] 54: I think the UQ section should cite your own paper on UQ (https://arxiv.org/abs/2208.08337), at the very least!
[x] 58: statistically-rigorous -> statistically rigorous
[x] 84: the URL reference is mangled, presumably should be just https://doi.org/10.1021/acscatal.0c04525
[x] I would suggest adding URLs for the LMDB and Skorch references
[x] Minor stylistic thing but it seems like most other JOSS papers include a space before the reference, e.g., blah blah [@Evans2023] rather than blah blah[@Evans2023]
[x] This sentence irks me slightly:

Thus, AMPTorch is currently the only feature-based ML force field code capable of training on an arbitrary number of elements and training points

as unfortunately by the time of publication these claims are almost always out-of-date (I take at least some of the blame for that!) I would suggest rephrasing it to something like "AMPTorch fills a gap in the ecosystem for feature-based ML force fields with capability for handling arbitrary numbers of elements and training points". I'm aware this is bordering on personal preference (maybe its just the italics...) though so I'll leave it up to you.

ajmedford commented 1 year ago

@ml-evs I think we have addressed all the issues here, thanks for the suggestions! Please take a look and let us know if you have any more suggestions.

ulissigroup / amptorch

[JOSS] Paper feedback #117