openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
697 stars 36 forks source link

[REVIEW]: polypy - Analysis Tools for Solid State Molecular Dynamics and Monte Carlo Trajectories #2824

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @symmy596 (Adam Symington) Repository: https://github.com/symmy596/Polypy Version: 0.8 Editor: @richardjgowers Reviewer: @hmacdope, @lscalfi Archive: 10.5281/zenodo.4568493

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/e17ff370f6ef5fa95bea0fea24cb856c"><img src="https://joss.theoj.org/papers/e17ff370f6ef5fa95bea0fea24cb856c/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/e17ff370f6ef5fa95bea0fea24cb856c/status.svg)](https://joss.theoj.org/papers/e17ff370f6ef5fa95bea0fea24cb856c)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@hmacdope & @lscalfi, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @richardjgowers know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @hmacdope

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @lscalfi

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @hmacdope, @lscalfi it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1088/2515-7655/ab28b5 is OK
- 10.1098/rsta.2019.0026 is OK
- 10.1039/D0TA05343K is OK
- 10.1002/bbpc.198400007 is OK
- 10.1016/0022-3697(85)90172-6 is OK
- 10.1016/S0167-2738(02)00229-1 is OK
- 10.1063/1.117366 is OK
- 10.1149/1.1507597 is OK
- 10.1103/PhysRevB.58.13901 is OK
- 10.1016/S0263-7855(96)00043-4 is OK
- 10.1080/08927022.2013.839871 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

hmacdope commented 3 years ago

Hi @symmy596, I'm one of your friendly JOSS reviewers.

Just a few things I thought I would point out as I go through, ranging in importance.

I will wait for you to respond here first, then raise the more pressing of these as issues on the central repo.

Onto the paper, it's looking good overall, just a few things.

Tracking well so far.

symmy596 commented 3 years ago

Hi @hmacdope Firstly, thank you for reviewing the code, I really appreciate you taking the time to look through it.

When developing the code I had several large example trajectories stored in the examples folder and added that folder to the .gitignore to stop them all being written to github until I decided which ones I would use for the examples. I have rectified this and the files should now be in the repository. Apolgoies for the mishap.

With regards to the rest of the issues that you have raised, I will wait for you to complete your review and address all remaining issues in one go.

Once again, thank you for reviewing the code and I look forward to hearing how it can be improved.

richardjgowers commented 3 years ago

Thanks for checking in @hmacdope , is everything going ok with your review @lscalfi ?

lscalfi commented 3 years ago

yes @richardjgowers, everything ok I'm going through the documentation, I should be done in a few days

hmacdope commented 3 years ago

Hey @symmy596. Functionality and usage examples look good to me. One minor thing is that the DLMONTE trajectory at the end of Tutorial 4 (MSD) gives an error (for the right reason which is that MC trajectories can't be used to calculate MSDs). I'm not sure if this was intentional? If it is no worries.

Other than that the only other queries I had have been raised in my previous comment.

lscalfi commented 3 years ago

Hi @symmy596, This is a useful package for doing analysis on MD or MC trajectories, the documentation is particularly detailed and I find the plotting functions especially aesthetic! Here are some points you may want to improve.

hmacdope commented 3 years ago

@symmy596 I will also add to @lscalfi's comment on the MSD. The MSD is normally defined as an ensemble average over all possible lag times. Do you average over all possible lag times and is this related to the sweeps parameter? Just for clarification. 👍

Thanks also for the letting me know the latex builds fine for you @lscalfi, must be a problem with my env.

symmy596 commented 3 years ago

Happy new year to you all.

I have attempted to address all of your comments. I have made a new version of polypy, version 0.8.1 which you will need to download if you wish to check out any of the functionality that has been tweaked.

For simplicity / to keep mysefl right I have copied your comments below and provided an answer beneath.

Q. I managed installing Polypy without problems although the required package section of the README should be updated with the list in requirements.txt. It could maybe be useful to specify if some packages are optional, and also to have the list of commands for those who use conda instead of pip. A. The requirements have been updated to include all packages and information regarding conda has been added under installation in the readme.

Q. The documentation is well written and going in detail through different examples which makes it easy to get started with Polypy. This is why I think it would be useful if the Documentation section of the README was a bit more detailed and highlighted the existence of such documentation (so that unexperienced users doesn't miss it). By the way, @hmacdope it worked fine for me both with html and latexpdf. However I ran into some trouble while going through the documentation. Overall I think there are discrepancies between the docs and the Jupyter notebooks (which seem to work). So here is a non-ordered list of remarks about the documentation: A. The readme has been updated to include information about the documentation. This can be found under documentation in the readme. “An online version of the documentation can be found here (https://polypy.readthedocs.io/en/latest/index.html). The documentation contains an extensive explanation of the underlying theory, function documentation and tutorials. “

Q. I am not used to DLPOLY or DL_MONTE files such as HISTORY, CONFIG or ARCHIVE, but I would be interested to use these analysis tools all the same. Adding either a small description of the files or a link to a description could be very useful to adapt this to other trajectories format (such as .xyz, .pdb...). A. A small description has been added to the documentation to provide a method for converting other file types to those used by polypy and also a description for how a user could add a new reading method. “The code has been developed to analyse DL_POLY and DL_MONTE calculations however other codes can be incorporated if there is user demand. Other formats, such as pdb or xyz can be converted to DL_POLY format with codes such as atomsk (https://atomsk.univ-lille.fr/) and then analysed with polypy. Users are welcome to increase the file coverage by adding a reading function for a different format. This can be accomplished by adding to the read module which has a class for each unique file type that converts it to a polypy.read.trajectory object. “

Q. In the tutorials section, several times the files ../example_data/HISTORY and ../example_data/ARCHIVE are used but they don't exist, I guessed I should use the other versions but this should be corrected. The tutorials in the documentation have been edited and are now in line with those in the examples notebooks. The description of the polypy.read.Trajectory is lacking some elements: the atom_labels are instead atom_list and atom_name, and the is additional fields: atoms_at_timestep, record_number, simulation_timestep, time. Could the description of all the fields be provided? (I don't believe they are present in the API) A. The tutorial has been updated to include more information.

Q. The example on the Cerium oxide also contains several errors, I think there has been a mix of the CaF2 system and the CeO2 (you read ca_density but then use ce_density, same for f_density instead of o_density, and later the fx_2d is not defined). Important: when trying to do the density profiles of HISTORY_GB which correspond to the CeO2, I had a Memory Error and could not do the analysis: does this happen often? Are you storing several large arrays ? A. The tutorial has been updated and should no longer contain errors. Generally, trajectory files are incredibly large (2-10 Gb) and analysing them locally in a jupyter notebook is not encouraged or indeed, practical. Juypter notebooks provide a useful way to introduce users to the functionality however in reality users would write python scripts and run them where the trajectories have been generated. I have written several example python scripts and included them in the repository. These can be used or edited to allow analysis without jupyter notebooks. They are included in the examples/python_scripts folder.

Q. The labels of the plots could be improved: the densities have no unit, the electric field should not be in V and I think the X or Y labels do not correspond to the x, y or z directions of the cell. I guess this is because of how the calculation is done but it would be best if the axis label corresponded to the required cell direction. A. The labels are set by default. I have fixed the obvious errors and added a section to the density tutorial explaining how plots can be customised, labels included.

Q. Regarding the two-dimensional density plots, the plots are very nice. And the possibility of overlaying multiple plots is very useful for visualization purposes. Is the dimension you specify the normal vector to the plane? It would also be very useful to only have the density profile only in a slice (for example only for atoms between z=0 and z=5). A. This can be accomplished by adding a xlim or ylim to the plot. This has been used in the density tutorials to remove the bulk part.

Q. Would it be easy to do more complex geometries?. For example, is it possible to do an oblique cut (i.e. do a projection on a plane which is not the xy, xz or yx plane) to look at another crystal plane? It could also be nice to have projection on curved surfaces if analysing nanotubes (as done here for example https://pubs.acs.org/doi/10.1021/acs.langmuir.8b01115) A. I think this would be quite challenging and thus beyond the scope of this first version. I have added a section to the readme outlining additional functionality that we plan to add including this and an RDF. Users can use this as a guide for making their own contributions or suggest other things that they would like added. This can be found under the Future section of the readme.

Q. Several functions have missing arguments (charge_density in the Two-dimensional... section, combined_density_plot_multiple_species in All together, and later conductivity in the ionic conductivity section) which trigger errors. A. The tutorials included in the documentation have been updated to be in line with those in the notebooks.

Q. Regarding the MSD: this functionality is also very useful. Can you define what are the sweeps in the documentation? By reading the API, the values are obtained by only the initial frame as reference by default, but it would probably be best to average over all starting timesteps. It is also usually useful to discard the initial ballistic regime and the last points where you have less statistics when computing the diffusion coefficient, is it possible in this case or the fit is done on the whole time range? A. You are correct, by default (for speed reasons) the msd will do a single sweep. The sweeps parameter increases the number of starting frames that are used. I have updated the documentation to make this parameter clearer. “MSD calculations require a large number of statistics to be considered representative. A full msd will use every single frame of the trajectory as a starting point and effectively do a seperate msd from each starting point, these are then averaged to give the final result. The sweeps paramter is used to control the number of frames that are used as starting points in the calculation. For simulations with lots of diffusion events, a smaller number will be sufficient whereas simulations with a small number of diffusion events will require a larger number.”

Q. For the regional MSD, it is not clear to me how to specify the region in space? Can it only be a slice or also only a cube by specifying xlow, ylow, zlow, xhi, yhi and zhi?

A. It can only be a slice. Dimension specifies the lattice vector normal to the slice and the parameters lower_boundary and upper_boundary specify the two boundary’s that define the slice. Again, I have added this to the additional functionality section of the readme.

Q. Usually there is a first equilibration phase in MD or MC, is it easily possible to specify from which timestep on the trajectory should be analyzed? This could be an additional point to add to the tutorials, especially for the density calculations.

A. A method has been added to allow users to remove initial and final timesteps. This has been included in the documentation and shown in the reading_data tutorial

Q. Another widely used functionality is the radial distribution function which should be easily computed using this package. It would be interesting to add it.

A. RDF would certainly be useful however its addition would represent a significant addition to the codebase that I think is beyond this version. I have added a new section to the readme listing useful new additions to the codebase that can be added in the future by me or by new users.

Q. I didn't see any documentation about how to run the tests in the /tests folder. A. The readme has been updated with a guide.

Q. The tests for the regional MSD seemed to be incomplete. A. Tests have been added

Q. Important: The files to run the tutorial notebooks out of the box are missing from examples/example_data. I'm guessing this is a problem with packaging (I cloned from master). I can't assess functionality until this is fixed. A. Files are now included.

Q. Suggestion: It would be great if there was an easy way to use only a section of the MSD (the "middle"). Long story short, we want the middle portion of the MSD, after the initial ballistic motion is over and before the poor averaging regime takes over. See the discussion in this paper. This kind of behaviour is clear in your fluorine diffusion in CaF_2 figure. This is not a blocking requirement. A. Two new functions have been added to the read.trajectory class to remove timesteps from the start and end of the trajectory. The trajectory, once clipped can be analysed with the MSD function as before.

Q. Nitpick: Would be fantastic if you could update the text describing the installation requirements in 'README.md' to include coveralls, coverage and sphinx etc as its in requirements.txt. A. Readme has been updated.

Onto the paper, it's looking good overall, just a few things. Q. You are missing a state of the field section. There are a several packages aimed at analysing MD trajectories (MC less so). How do you differ from these? A. The paper has been updated

Q. I think the second paragraph starting with A molecular dynamics trajectory is a snapshot... could be clearer and more accurate. A more detailed explanation around what MD and MC are is required for a general audience to follow what kind of data we are dealing with here. A. This paragraph has been altered slightly to include a description of MD and MC

Q. A description of the theory used to calculate some of the properties I think would be helpful to many. Currently it is just stated that you can compute them. I know it's in the docs, but it is central to the package. A. I think the variety of functionality and the depth of the theory would be too much information for the paper. I am happy to add additional information if both the reviewers and editor think it is necessary.

Q. About the paper itself, I noticed the acknowledgments are different from those in the README, should it be updated? A. The acknowledgments have been updated.

hmacdope commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

hmacdope commented 3 years ago

@symmy596 Thanks for addressing all of my concerns and several of my suggestions.

I take your point about including all the theory being too much information to include in the article, I am happy for it to be in the documentation as it is currently. All of my other nitpicks appear to have been fixed up. Just one more thing:

You have addressed my comments on the paper including a state of the field section and mention of other packages. You have also highlighted the unique advantages of Polypy and demonstrated its effectiveness in several of your publications. Great work!

Following resolution of the above I would be happy to recommend your article for publication in JOSS. 👍

richardjgowers commented 3 years ago

Hi @lscalfi when you've got a moment can you see if your comments have been addressed or if further changes are required, thanks!

lscalfi commented 3 years ago

Hi @symmy596, thank you for addressing our concerns. I just have a few remarks left:

Apart from these comments, you have addressed all the other issues I raised and I am happy to recommend for publication.

symmy596 commented 3 years ago

Q. Would you be able to add a discussion of the sweeps parameter to the theory section of the documentation? The MSD is technically an ensemble average over the number of sweeps and the number of particles. This should be reflected in a brief comment in the theory section. A. Further description has been added to the documentation and tutorials. "MSD calculations require a large number of statistics to be considered representative. A full msd will use every single frame of the trajectory as a starting point and effectively do a seperate msd from each starting point, these are then averaged to give the final result. An MSD is technically an ensemble average over all sweeps and number of particles. The sweeps paramter is used to control the number of frames that are used as starting points in the calculation. For simulations with lots of diffusion events, a smaller number will be sufficient whereas simulations with a small number of diffusion events will require a larger number."

Q. for the MSD, you say "Two new functions have been added to the read.trajectory class to remove timesteps from the start and end of the trajectory. The trajectory, once clipped can be analysed with the MSD function as before.". I am not sure to understand. The 'problem' is not in the equilibration time of the trajectory but in the MSD itself. It is supposed to be linear only after a ballistic regime and it usually lacks statistics for longer times, so that the linear fit to extract the slope and thus the diffusion coefficient should be done on a portion of the MSD only. Is this what is done? A. I see and I apologise for not getting the point the first time. No those functions allow a user to index specific "sections" of the trajectory for analysis, this is particularly useful when studying events introduced during the simulation. I have added some extra functionality that allows a user to exclude timesteps at the start and end of a trajectory when calculating the diffusion coefficient. This has been added to the tutorial and documentation. "Note: An MSD is supposed to be linear only after a ballistic regime and it usually lacks statistics for longer times. Thus the linear fit to extract the slope and thus the diffusion coefficient should be done on a portion of the MSD only. This can be accomplished using the exclude_initial and exclude_final parameters"

Q. section to describe more MD and MC has been added (there are some typing errors in the paper: postiion, simulaton...) and a section on the state of the art which describes well the advantages of polypy. You say polypy works "for simulations ensembles, not just NPT.": does it handle grand canonical simulations with variable N too? If it's the case this can be highlighted. A. Polypy works for NPT, NVT, semi grand and grand canonical simulations. I have updated the documentation to make this clearer. Typos have also been corrected.

A new version has been pushed to pypi and a new commit to the git repo has been made adressing these changes.

Thanks!

Adam

lscalfi commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

lscalfi commented 3 years ago

Hi @symmy596, Your answers look good to me, however your repository seems to have some troubles (the CI failed at your last commit). Could you fix this?

symmy596 commented 3 years ago

@lscalfi Thanks for pointing that out, I had missed it. Build is now fixed.

hmacdope commented 3 years ago

@symmy596 All good on my end also :+1:

hmacdope commented 3 years ago

@richardjgowers pending confirmation from @lscalfi this review is possibly finished?

lscalfi commented 3 years ago

Oh sorry I am happy with the modifications yes

hmacdope commented 3 years ago

Sorry wasn't meaning to be a pain :)

richardjgowers commented 3 years ago

Hey @symmy596 just reading through the paper and saw a couple things in the statement of need section:

symmy596 commented 3 years ago

@whedon generate pdf

symmy596 commented 3 years ago

@richardjgowers

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

symmy596 commented 3 years ago

@whedon generate pdf

symmy596 commented 3 years ago

I have also added another reference, citing work just published using polypy as the analysis tool.

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

hmacdope commented 3 years ago

Fine by me!

richardjgowers commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

richardjgowers commented 3 years ago

@symmy596 not sure if you forgot to push up the changes but I'm not seeing the changes made in the repo

symmy596 commented 3 years ago

@richardjgowers I think I need more clarification on your comments. Q. the references to DL_ANALYSER and Chemfiles probably want to go up to the other references of existing packages. A. I thought this referred to the references for different packages under the statement of need ad hence "There are a number of codes designed to analyse molecular dynamics trajectories that currently exist. MDAnalysis [@mdanalysis]" was altered to include the relavant citations and now reads, "There are a number of codes designed to analyse molecular dynamics trajectories that currently exist (Fraux et al., 2020;Michaud-Agrawal et al., 2011;Yong & Todorov, 2018)." (line 68).

Q. I'm pretty sure most packages will handle non-orthogonal unit cells unless there's a particular geometry I've missed? A. I thought this was referring to, "polypy is designed to handle all cell types and will work for several simulation ensembles including, NPT, NVT, semi grand and grand canonical.". This sentence was changed and the reference to different cell types was removed (line74).

richardjgowers commented 3 years ago

@symmy596 re references, in the above pdf, lines 68 & 82 should get merged. re orthogonal cells line 88

symmy596 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

symmy596 commented 3 years ago

@richardjgowers Thanks. Lines now merged and bullet point removed.

symmy596 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

richardjgowers commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1088/2515-7655/ab28b5 is OK
- 10.1098/rsta.2019.0026 is OK
- 10.1039/D0TA05343K is OK
- 10.1002/bbpc.198400007 is OK
- 10.1016/0022-3697(85)90172-6 is OK
- 10.1016/S0167-2738(02)00229-1 is OK
- 10.1063/1.117366 is OK
- 10.1149/1.1507597 is OK
- 10.1103/PhysRevB.58.13901 is OK
- 10.1016/S0263-7855(96)00043-4 is OK
- 10.1080/08927022.2013.839871 is OK
- 10.5281/zenodo.3653157 is OK
- 10.3390/molecules23010036 is OK
- 10.1039/D0TA11539H is OK

MISSING DOIs

- None

INVALID DOIs

- https://doi.org/10.1002/jcc.21787 is INVALID because of 'https://doi.org/' prefix
richardjgowers commented 3 years ago

@whedon accept

whedon commented 3 years ago

No archive DOI set. Exiting...

richardjgowers commented 3 years ago

@symmy596 ok I think we're good to go. Couple of things, the DOI for the mdanalysis reference seems broken, it looks like there's two https:// pieces. And you'll also have to create a zenodo DOI :

At this point could you:

I can then move forward with accepting the submission.

symmy596 commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1088/2515-7655/ab28b5 is OK
- 10.1098/rsta.2019.0026 is OK
- 10.1039/D0TA05343K is OK
- 10.1002/bbpc.198400007 is OK
- 10.1016/0022-3697(85)90172-6 is OK
- 10.1016/S0167-2738(02)00229-1 is OK
- 10.1063/1.117366 is OK
- 10.1149/1.1507597 is OK
- 10.1103/PhysRevB.58.13901 is OK
- 10.1016/S0263-7855(96)00043-4 is OK
- 10.1080/08927022.2013.839871 is OK
- 10.5281/zenodo.3653157 is OK
- 10.1002/jcc.21787 is OK
- 10.3390/molecules23010036 is OK
- 10.1039/D0TA11539H is OK

MISSING DOIs

- None

INVALID DOIs

- None