openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
697 stars 36 forks source link

[REVIEW]: MD-SAPT: Python Based Toolkit for Running Symmetry Adapted Perturbation Theory Calculations on Molecular Dynamics Trajectories #5931

Open editorialbot opened 9 months ago

editorialbot commented 9 months ago

Submitting author: !--author-handle-->@armcdona<!--end-author-handle-- (Ashley Ringer McDonald) Repository: https://github.com/calpolyccg/MDSAPT Branch with paper.md (empty if default branch): JOSS_Pub Version: v2.0 Editor: !--editor-->@arfon<!--end-editor-- Reviewers: @wiederm, @CarvFS Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/baf524e08bec6dcbd6f57a0e9b6236b9"><img src="https://joss.theoj.org/papers/baf524e08bec6dcbd6f57a0e9b6236b9/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/baf524e08bec6dcbd6f57a0e9b6236b9/status.svg)](https://joss.theoj.org/papers/baf524e08bec6dcbd6f57a0e9b6236b9)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@wiederm & @CarvFS, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @CarvFS

📝 Checklist for @wiederm

editorialbot commented 9 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 9 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.12 s (539.1 files/s, 60084.7 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          21            876           1707           2433
XML                              1              0              0            463
YAML                            14             59            120            380
Markdown                        10             99              1            297
TeX                              1             13              0            201
reStructuredText                 8             57             75             57
Bourne Shell                     3              8             10             43
JSON                             2              5              0             42
DOS Batch                        1              8              1             27
make                             1              4              6             10
Dockerfile                       1              4             12              3
-------------------------------------------------------------------------------
SUM:                            63           1133           1932           3956
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 9 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1021/acs.jcim.8b00989 is OK
- 10.1093/bioinformatics/btx789 is OK
- 10.1063/5.0006002 is OK
- 10.1002/jcc.21787 is OK
- 10.25080/Majora-629e541a-00e is OK
- 10.1002/jcc.23753 is OK
- 10.1002/wcms.1452 is OK
- 10.6084/M9.FIGSHARE.17156018.V1 is OK
- 10.1002/wcms.84 is OK
- 10.1021/bk-2007-0958 is OK
- 10.1002/9780470399545.ch1 is OK
- 10.1016/j.bpj.2015.08.015 is OK
- 10.6084/m9.figshare.20520726.v1 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 9 months ago

Wordcount for paper.md is 1255

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

arfon commented 9 months ago

@wiederm, @CarvFS – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5931 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

CarvFS commented 9 months ago

Review checklist for @CarvFS

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

CarvFS commented 9 months ago

Review: observations on the paper and functionality

First of all: thank you very much for developing this toolkit! It is important to make the non-convalent bond analysis easier for everyonw working with molecular dynamics. Here I will post some comments on the paper and toolkit. Any new observations I will post as a new comment.

About installation:

I think the authors should update the documentation to avoid users running into issues while trying to install their toolkit. A few things I have observed:

Maybe using some specific version of pydantic would avoid my last error. However, it should be listed in the documentation which versions the user must have.

About important links

About the paper and toolkit

General comments

arfon commented 8 months ago

:wave: @wiederm – just checking in here, when do you think you might be able to complete your review?

wiederm commented 8 months ago

Hi @arfon , sorry for the delay! I will add my review on Monday!

wiederm commented 8 months ago

Review checklist for @wiederm

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

wiederm commented 8 months ago

Review

Thank you for providing this software.

Software

Manuscript

l40: comment needs to be removed l53: comment needs to be resolved

CarvFS commented 8 months ago

Additional review:

Generating input file:

system_limits:
  ncpus: 
  memory: 

and this cause an error. For example, by running:

mdsapt generate test.yaml

and then,

import mdsapt

config = mdsapt.load_from_yaml_file('test.yaml')

I have got the error:

2023-11-21 12:00:58,888 mdsapt.config ERROR    Error while loading config from 'test.yaml'
Traceback (most recent call last):
  File "/home/felipe/miniconda3/envs/mdsapt/lib/python3.9/site-packages/mdsapt/config.py", line 309, in load_from_yaml_file
    return Config.model_validate(yaml.safe_load(file))
  File "/home/felipe/miniconda3/envs/mdsapt/lib/python3.9/site-packages/pydantic/main.py", line 503, in model_validate
    return cls.__pydantic_validator__.validate_python(
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Config
system_limits.ncpus
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/int_type
system_limits.memory
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/string_type
Traceback (most recent call last):
  File "/mnt/d/OneDrive - Universidade Federal de Minas Gerais/Pós doc/paper-review/MD-SAPT/testing/run_mdsapt.py", line 3, in <module>
    config = mdsapt.load_from_yaml_file('test.yaml')
  File "/home/felipe/miniconda3/envs/mdsapt/lib/python3.9/site-packages/mdsapt/config.py", line 312, in load_from_yaml_file
    raise err
  File "/home/felipe/miniconda3/envs/mdsapt/lib/python3.9/site-packages/mdsapt/config.py", line 309, in load_from_yaml_file
    return Config.model_validate(yaml.safe_load(file))
  File "/home/felipe/miniconda3/envs/mdsapt/lib/python3.9/site-packages/pydantic/main.py", line 503, in model_validate
    return cls.__pydantic_validator__.validate_python(
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Config
system_limits.ncpus
  Input should be a valid integer [type=int_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/int_type
system_limits.memory
  Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.5/v/string_type

By filling ncpus and memory it ran without any issue. Since those parameters will vary from one user to another, it could be filled with a general case (ncpus = 1 and memory = '1Gb', for example) or explicit tell the user to fill those.

The way it is described in the paper, the reader will be lead to think that one just need to run mdsapt generate [filename] and not add/modify anything else before running the calculations.

Running SAPT calculations

 raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'Config' object has no attribute 'start'

To make it run I have changed to

sapt_run.run(config.analysis.frames.start, config.analysis.frames.stop, config.analysis.frames.step)

This should be fixed in the paper example.

arfon commented 7 months ago

@armcdona – looks like there's a fair amount of feedback to start working on from the reviewers here. Could you get started and let us know here how you're getting on?

Kevin-Mattheus-Moerman commented 6 months ago

@armcdona :wave: we hope you are doing well. Could you please provide an update to the editor here? Thanks!

ifd3f commented 6 months ago

I'm currently working on packaging it with conda-forge (https://github.com/conda-forge/staged-recipes/pull/24782). We were previously packaged with psi4's channel but since they're distributing with conda-forge now, we're moving our package there too. I've been very busy with work recently, and had to deal with a cold and strep, but I will get back to packaging this soon.

arfon commented 5 months ago

:wave: @ifd3f – just checking in again here to see how you're getting along?

ifd3f commented 5 months ago

I got builds to work, now I need to make it pass the conda-forge CI. Hopefully I should have it published on conda-forge by end of week. Sorry about the wait.

Kevin-Mattheus-Moerman commented 4 months ago

@ifd3f thanks for that update. Were you able to get that done? Let us know if we can proceed. It would be good to proceed, to avoid loosing track of the reviewers.

ifd3f commented 3 months ago

It was finally merged into conda-forge after a lengthy process. I'll update the instructions in the paper once I get a chance.

ifd3f commented 2 months ago

Hi @arfon @Kevin-Mattheus-Moerman, sorry for the late update, but we are ready to proceed again.

arfon commented 1 month ago

@ifd3f - are you suggesting you're ready for the reviewers to pick up their reviews again? If so, could you please give a brief summary of the updates you've made in response to their feedback thus far?

ALescoulie commented 1 month ago

I wanted to apologize for the delays, I've been dealing with some mental and physical health issues, but am committed to seeing this process through. I started my response to the reviews today, and will be putting out updates to the software and paper throughout the week. So It's easier to read, and I don't blow up everyone's notifications, I'll give the responses as a few larger comments on the thread.

ALescoulie commented 1 month ago

Response to First @CarvFS Review

This is a summary of my response to the feedback in the first review in updates both to the paper and MD-SAPT its self. I'll check with @ifd3f today to make sure the 2.0.4 release goes out for MD-SAPT soon with the input files updated based on the feedback. I'm looking forward to hearing your feedback on my response, and working to improve this project further.

Addressing Instillation issues

@ifd3f has worked on resolving the installation issues, and getting an official Conda-Forge release. I also have updated the README and docs to reflect the following updates to installation.

Python version 3.7 or older gives the error: "cannot import name 'Literal' from 'typing'";

We dropped 3.7 support, but I think we must have forgotten to remove that from out list of supported versions. It was originals supported but in our 2.0.0 update, we used features from Python's Typing library that broken MD-SAPT for that release.

Trying to install as conda install -c psi4/label/dev -c conda-forge mdsapt is not working

The Conda-Forge release should have resolved, you can install with MDSAPT using conda install -c conda-forge mdsapt. The root validator deprication issue was also fixed by @ifd3f.

Addressing Important Links

Link to readthedocs page at the paper is not redirecting to the correct page;

I fixed the links in the paper.

The binder demo are returning some error messages.

I updated the demo's repo to use conda release of MD-SAPT, but for some reason it installs a borken version of psi4. Later today I'll work on debugging that more, but for the time being its still broken. The demo was first made for a confrence prior to MD-SAPT having a conda release so it was a bit hacked together, but now that we have a conda release it was easy to get up and running again.

Addressing Paper and Software Concerns

When it says "including" in line 35 of page 1, it gives the idea that there are other tasks not described. Is this the case? Maybe describe the all workflow tasks as bullet points in the paper and/or add it to the documentation;

I updated the information about the steps out workflow includes to be more clear. It now reads:

It combines the tasks required for this workflow:
  - selecting the specific residues for the QM calculation, adding protons where the residues were bound to the polypeptide chain
  - determining the formal charge
  - passing the charge and coordinates into psi4 (the QM calculation program)
  - saving the results into an easily queriable csv file or Pandas DataFrame

Do you think that it describes the process more clearly?

It is said that a number of python based analysis tools exist for MD data. However, it is not clear if any of those also performs the same tasks implemented in MD-SAPT. For example: psi4 is used to run the quantum calculations, MDAnalysis is used to handle the MD simulation data. However, is not said running SAPT for each frame of a MD simulation is only possible with MD-SAPT toolkit;

I think that paragraph was unclear, we were indeding to discuss existing tools for working with MD data, and how MD-SAPT fits in with those. None of them from my research support more involved analysis methods like what we implemented in MDSAPT. In that section I was trying to convey what the MD ecosystem consists of, and how our project fits into that. I rewrote that paragraph to make that more clear, and emphasize that we fit in as a package built on top of an existing analysis system. Is the following paragraph more clear?

Currently there are a number of libraries that simplify the process of studying MD data in software, such as MDtraj, pyLOOS, and MDAnalysis [@mcgibbon; @romo; @michaud; @gowers].
They provide a clean, common interface for reading, quering, and writing MD varrious file formats, that other analysis software can be built upon.
The Python library MDAnalysis is a core dependency of MD-SAPT.
By using MDAnalysis as a starting point for MD-SAPT, building our own system for reading MD data and select specified residues was not required.

The next paragraph elaborates on how MD-SAPT is an MDAKit, which is a part of the MDAnalysis ecosystem.

Psi4 has a wide range of options for running SAPT calculations (https://psicode.org/psi4manual/master/sapt.html#sapt0): for example, the default level of theory is SAPT0, but we also can run using SAPT2, SAPT2+ or SAPT2+3. Is it possible to change those options through MD-SAPT?

Yes, it is by modifying the psi4:method: field of the input file.

If so, maybe the preformatted file should contain all possible settings with the standard values (since it is said on the paper that "This provides users with a preformatted file, such that they only need to fill in their parameters without any additional formatting").

That is a good idea for making it more clear how to use MD-SAPT so I added some information to the input files generated using mdsapt generate.

This is the new format of the trajectory analysis file:

psi4:
  # The sapt theory level eg: SAPT0, SAPT2, SAPT2+, SAPT2+3
  # See https://psicode.org/psi4manual/master/sapt.html#sapt-level for more information
  method: 'sapt0'
  # Basis set for Psi4 SAPT calculation
  # See https://psicode.org/psi4manual/master/basissets_byelement.html#apdx-basiselement for complete list
  basis: 'jun-cc-pvdz'
  # set to true to save Psi4 output files
  save_output: true
  # Additional Psi4 settings see https://psicode.org for more psi4 specific documentation
  settings:
    # The reference wave function type eg: RHF, ROHF, UHF, CUHF, RKS, UKS
    # See https://psicode.org/psi4manual/master/autodir_options_c/scf__reference.html for more information
    reference: 'rhf'

simulation:
  ph: 7.0
  # pH of simulated system, it informs how protons are replaced when balancing spin state before Psi4 calculations
  charge_guesser: 'standard'
  # charge_guesser: 'rdkit'  # to use rdkit. Make sure it is installed first.

system_limits:
  # Integer number of cpu cores to use in Psi4 calculation
  ncpus:
  # Given memory form `"{number_of_gigabytes}GB"`
  memory:

analysis:
  # This section is for running TrajectorySAPT. To run other types of analyses, see below.
  type: 'trajectory'

  topology: 'your/file/path.pdb'
  # If you want to override the charges of specific, you may specify it this way.
  # topology:
  #    path: 'your/file/path.pdb'
  #    charge_overrides:
  #      132: -2

  trajectories:
    - 'your/trajectory/path.dcd'
  pairs:
    # List the pairs of residues to calculate interaction energy between based on residue ids in topology file
    - [132, 152]
    - [34, 152]
  frames:
    # Starting frame
    start: 10

    # Stop frame
    stop: 30

    # frames per calculation step
    step: 1

  # Name of outputed csv file
  output: 'output.csv'

This is what the default docking input file gives you:

psi4:
  # The sapt theory level eg: SAPT0, SAPT2, SAPT2+, SAPT2+3
  # See https://psicode.org/psi4manual/master/sapt.html#sapt-level for more information
  method: 'sapt0'
  # Basis set for Psi4 SAPT calculation
  # See https://psicode.org/psi4manual/master/basissets_byelement.html#apdx-basiselement for complete list
  basis: 'jun-cc-pvdz'
  # set to true to save Psi4 output files
  save_output: true
  # Additional Psi4 settings see https://psicode.org for more psi4 specific documentation
  settings:
    # The reference wave function type eg: RHF, ROHF, UHF, CUHF, RKS, UKS
    # See https://psicode.org/psi4manual/master/autodir_options_c/scf__reference.html for more information
    reference: 'rhf'

simulation:
  ph: 7.0
  # pH of simulated system, it informs how protons are replaced when balancing spin state before Psi4 calculations
  charge_guesser: 'standard'
  # charge_guesser: 'rdkit'  # to use rdkit. Make sure it is installed first.

system_limits:
  # Integer number of cpu cores to use in Psi4 calculation
  ncpus:
  # Given memory form `"{number_of_gigabytes}GB"`
  memory:

analysis:
  type: 'docking'
  # MD-SAPT analysis type (in this case docking)
  topology_directory: 'mek/docking'
  # The set of topology files for docking analysis
  pairs:
  # Place pair names defined above in list in a list
  - [15, 132]

system_limits:
  # Integer number of cpu cores to use in Psi4 calculation
  ncpus:
  # Given memory form `"{number_of_gigabytes}GB"`
  memory:

The example given in the paper is about the interactions between residuals in the Adenylate kinase. However, it seems that those kind of computations should be carried out using fisapt module

I think this is a question that is could be better anwsered by @armcdona, as she is a quantum chemist and far more knowledgeable on psi4 than I am. I'll get her input on that question.

Having this in mind: how MD-SAPT prepares the input for running Psi4? Is it already using fisapt? If not, how the user should change this?

I'll run some tests but looking at the Psi4 documentation, I see no reason why providing method: fisapt in an input file wouldn't work. From looking at the docs it seems like this Psi4 input files are the same.

In the line 29 of page 1, it is said: "offering detailed insight into the strength and type of noncovalent interactions interactions". It should be discussed why this understanding is important and how it could be applied for a non-specialist audience.

Thank you for the papers you attached with this feedback, they were very helpful for demonstrating the utility of MD-SAPT's methodology. I added to the second paragraph of "Statment of Need" discussing the ACS-JCIM paper, and how they used SAPT calculations, to identify more residue interactions and identify interaction energies that differed substantially from those predicted by MD.

Here is what I wrote verbatim:

"The molecular mechanics methods used in MD simulations do not characterize noncovalent interactions at the same level of detail as quantum mechanical methods, but allow simulations of large biomolecular systems [@lipkowitz]. In this work, we present a workflow tool to utilize ab initio methods as a post-processing technique on MD data, offering detailed insight into the strength and type of noncovalent interactions. Using classical methods, such as those applied in MD, there are some noncovalent interactions that cannot be accurately modeled. One paper that applied SAPT calculations to MD simulated proteins found that the SAPT calculations highlighted more residue interactions that shown in MD energy calculations, and that SAPT identified binding energies that differed from those predicted classically [@ding]. Their work illustrates the utility of SAPT in identifying critical active site interactions, and how that information can guide the search for ligands. "

I hope that this better describes why MD-SAPT may be useful to an audience outside of quantum chemistry.

In line "This result was generated as a demonstration of the kinds of data MD-SAPT can generate". This phrasing can lead the reader to think there are other types of data which can be generated by MD-SATP. If it is the case, the authors should explain all the outputs that can be acquired.

I argee that was vague wording, and I made it clear that the data in that figure is what MD-SAPT generates.

This is the replacement sentence: "This result illustrates the non-covalent interaction energy decomposition data generated by MD-SAPT."

Is there any plans to add support to other topology formats, e.g. amber topology formats (.prmtop)?

I have not specifically tested it but we can use any format that MDAnalysis supports, so amber's .prmtop should work with current versions of MDAnalysis. Here is their current format overview https://userguide.mdanalysis.org/stable/formats/index.html.

Other Questions

The last commit to the MD-SAPT repository was made Oct 15, 2022. Is there any updates made after this?

We have since then made some changes to the repository based on reviews and fixing releases.

arfon commented 1 month ago

Thanks for the update and long response @ALescoulie. @wiederm, @CarvFS – when you have a moment, would you both be able to update your reviews based on this feedback?

CarvFS commented 3 weeks ago

Hi @arfon and @ALescoulie!

I hope you are better now, @ALescoulie :) and thank you very much for your reply!

I will try the install process later to see if I get any issues, but everything else covered in the reply sounds excellent to me. After getting the feedback on my second review I will update the review checklist.