[JOSS review] Code feedback

ml-evs commented 1 year ago

Hi @ajmedford and other authors! Apologies for the slightly slow uptake on my review (https://github.com/openjournals/joss-reviews/issues/5035) -- I have now been able to install the package and try the examples and have enough to give a first round of feedback.

[ ] I think it would be very helpful if the config passed to AtomsTrainer used in many of the examples could be wrapped into a class (perhaps a dataclass). The structure of the class is basically already provided in https://amptorch.readthedocs.io/en/latest/usage.html but it is currently very easy to make a mistake or get lost amongst default values. This should also aid development going forwards.
[ ] The usage documentation should explain the relevant classes involved and requirements for e2e training and evaluating a potential. Currently much of this data is stuffed into comments in the large config code snippet.
[ ] Overall the examples could be better motivated (why you might want to do such a thing, step by step), both in the online documentation and in the code itself. Currently they showcase the novel features of Amptorch (GMP, lmdb etc.) but I felt a basic tutorial for just training the "default" network was missing. This is kinda provided on the "usage" page but it takes some work to piece it together in its current state. It would be nice if the examples generated some figures to actually see what is happening, executing the code snippets which provide little feedback.
[x] If this package was used in the several linked preprints about GMP, singleNN etc. then perhaps code examples to generate some of the (simpler?) results from these papers would be appropriate, if at all possible.
[x] The code itself has no docstrings describing the functionality of parameters of any of the classes or functions (that I could find). API documentation is somewhat of a hard requirement for JOSS
[x] The GMP example should be made simpler by specifying the path to the already-provided pseudodensity files: https://github.com/ulissigroup/amptorch/blob/78cb2f19c068884889883bea05faf1189f245698/examples/GMP/GMP_example.py#L29-L31 as e.g., Path(__file__).parent / "pseudodensity_psp".
[x] With a default installation I was not able to get all of the tests to pass (consistency_test.py fails, as well as several in test_script.py, in both CPU and GPU mode). Could you perhaps provide some documentation for executing the tests, in case it is my set up that is wrong? (test failures hidden below)
[x] The last code release was in July 2021 with the name "initial release". The code should be released in its complete state (which it seems to be!) before publication in JOSS. Any of the suggestions above could then be implemented in a future released version.

Test failures

``` =============================================================== test session starts ================================================================platform linux -- Python 3.9.16, pytest-7.2.1, pluggy-1.0.0 -- /home/mevans/.local/conda/envs/amptorch/bin/python cachedir: .pytest_cache rootdir: /home/mevans/src/amptorch collected 23 items amptorch/tests/consistency_test.py::test_energy_force_consistency FAILED [ 4%] amptorch/tests/cp_uncertainty_calibration_test.py::test_cp_uncertainty_calibration PASSED [ 8%] amptorch/tests/cutoff_funcs_test.py::test_cutoff_funcs PASSED [ 13%] amptorch/tests/gaussian_descriptor_set_test.py::test_gaussian_descriptor_set PASSED [ 17%] amptorch/tests/pretrained_test.py::test_pretrained PASSED [ 21%] amptorch/tests/pretrained_test.py::test_pretrained_no_config PASSED [ 26%] amptorch/tests/test_script.py::test_cutoff_funcs PASSED [ 30%] amptorch/tests/test_script.py::test_gaussian_descriptor_set PASSED [ 34%] amptorch/tests/test_script.py::test_pretrained PASSED [ 39%] amptorch/tests/test_script.py::test_pretrained_no_config PASSED [ 43%] amptorch/tests/test_script.py::test_lmdb_pretrained PASSED [ 47%] amptorch/tests/test_script.py::test_lmdb_pretrained_no_config PASSED [ 52%] amptorch/tests/test_script.py::test_training PASSED [ 56%] amptorch/tests/test_script.py::test_training_gmp PASSED [ 60%] amptorch/tests/test_script.py::test_cp_uncertainty_calibration FAILED [ 65%] amptorch/tests/test_script.py::TestMethods::test_cosine_and_polynomial_cutoff_funcs PASSED [ 69%] amptorch/tests/test_script.py::TestMethods::test_gds PASSED [ 73%] amptorch/tests/test_script.py::TestMethods::test_load_retrain PASSED [ 78%] amptorch/tests/test_script.py::TestMethods::test_load_retrain_lmdb PASSED [ 82%] amptorch/tests/test_script.py::TestMethods::test_training_scenarios PASSED [ 86%] amptorch/tests/test_script.py::TestMethods::test_training_scenarios_gmp PASSED [ 91%] amptorch/tests/test_script.py::TestMethods::test_uncertainty_cp FAILED [ 95%] amptorch/tests/training_test.py::test_training PASSED [100%] ===================================================================== FAILURES ===================================================================== ```

ajmedford commented 1 year ago

@ml-evs - I think we addressed these comments in https://github.com/openjournals/joss-reviews/issues/5035 by mistake. Let us know if any of these issues still need to be addressed before publication.

ml-evs commented 1 year ago

Hi @ajmedford, thanks for pointing this out, and thanks to you and @nicoleyghu for the work implementing my suggestions. I've ticked off the things that are fixed above, and although I think some basic examples would be helpful (e.g., showing plots that describe the output of the sweep example), I'm satisfied that the examples cover your advanced features well.

I've just been going through my initial testing scripts and I've run into a couple of minor issues.

The default fp_scheme seems to have changed since my last attempts, so the docs are now out of date here:

-     "fp_scheme": str,             # Fingerprinting scheme to feature dataset, "gaussian" or "gmp" (default: "gaussian")
+     "fp_scheme": str,             # Fingerprinting scheme to feature dataset, "gaussian", "gmp" or "gmpordernorm" (default: "gmpordernorm")

My old training script didn't set fp_scheme and now fails with the error

  File "/home/mevans/src/amptorch/amptorch/dataset.py", line 49, in __init__
    self.descriptor = construct_descriptor(descriptor_setup)
  File "/home/mevans/src/amptorch/amptorch/dataset.py", line 131, in construct_descriptor
    descriptor = GMPOrderNorm(MCSHs=fp_params, elements=elements)
  File "/home/mevans/src/amptorch/amptorch/descriptor/GMPOrderNorm/__init__.py", line 35, in __init__
    self.default_cutoff()
  File "/home/mevans/src/amptorch/amptorch/descriptor/GMPOrderNorm/__init__.py", line 59, in default_cutoff
    sigmas = self.MCSHs["MCSHs"]["sigmas"]

There is of course no requirement that your API stays fixed during review, but just thought I'd mention it in case it was unintentional! Setting fp_scheme to gaussian allows me to train potentials once again.

ajmedford commented 1 year ago

@ml-evs Thanks for these suggestions and pointing out the conflicts in the documentation. I think everything is now taken care of, except for the config data class, which is a work in progress but will hopefully be finished soon. I have opened a separate issue for that here: https://github.com/ulissigroup/amptorch/issues/123, and, for logistical reasons, I'm hoping we can close this issue and finalize the paper while we finish this up.

Regarding the other suggestions:

usage - we added a short demo of training on a simple dataset with exactly 2 degrees of freedom, and visualized the results with respect to one degree of freedom (the bond length).
examples - we reorganized the /examples folder upon the initial submission, and hope now the example scripts resonate better with usage documentation.

Let us know if you would like to see any additional changes before getting this officially published.

ml-evs commented 1 year ago

Thanks @ajmedford and @nicoleyghu for the hard work on this, I think they really help (my own) understanding of the code -- I hope it wasn't too arduous. I'm happy to close this issue and will give my recommendation over in the JOSS issue.

ajmedford commented 1 year ago

Thanks @ml-evs for all the detailed feedback! While the changes were a lot of work, I think they made the package much stronger, and were worth the effort. We also appreciate your patience as we navigated this process and learned a lot about open-source software development along the way.

ulissigroup / amptorch

[JOSS review] Code feedback #112