Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @dotsdl, @rmeli, @amandadumi it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
PDF failed to compile for issue #2157 with the following error:
Error reading bibliography ./paper.bib (line 75, column 3): unexpected "d" expecting space, ",", white space or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF
@thesketh looks like three's an issue with the bibtex
@whedon generate pdf
PDF failed to compile for issue #2157 with the following error:
Error reading bibliography ./paper.bib (line 75, column 3): unexpected "d" expecting space, ",", white space or "}" Error running filter pandoc-citeproc: Filter returned error status 1 Looks like we failed to compile the PDF
@whedon generate pdf
I had a first look at the package and have a few comments, none of which I deemed worth of its own issue for now (besides, some issues were already open). I will have a deeper look in the upcoming days, but I thought it might be useful to have some initial comments earlier.
(@richardjgowers if you prefer to see the comments in bulk do let me know and I'll adjust for next time)
Installation is easy and described in the README
. A few comments:
RDKit
can only be installed via conda
I would remove the requirements.txt
file.yml
defining a full environment for testing (so that the user can avoid searching for openmm
installation instructions)README
could be beneficialSee comments above.
Detailed installation instructions could be added to the main documentation as well (the website is already in place but only the API documentation is present).
Automated tests can be easily run with pytest
and all pass. However, there is no test coverage displayed. Using pytest-cov
I can see that some modules have rather low coverage (convenience_functions.py
is at 41%). I think the test coverage metric should be added and improved for some modules.
(See also thesketh/pygen-structures#3)
Documentation for the objects returned by different functions is often missing.
There is a single usage example. I think more usage examples should be added.
It would be useful to describe the different input in more details within the documentation with the help of usage examples.
(See also thesketh/pygen-structures#2; as noted there, the documentation skeleton is already in place and should be relatively straightforward to add an "Usage Examples" section)
I noticed that the CHARMM parameter files are included under pygen_structures/toppar
. Is there an associated license for such files?
Hey Rocco, thanks for the feedback, I think these points are probably good to be raised as issues too
On Tue, Mar 10, 2020 at 22:34, Rocco Meli notifications@github.com wrote:
I had a first look at the package and have a few comments, none of which I deemed worth of its own issue for now (besides, some issues were already open). I will have a deeper look in the upcoming days, but I thought it might be useful to have some initial comments earlier.
(@richardjgowers https://github.com/richardjgowers if you prefer to see the comments in bulk do let me know and I'll adjust for next time) Functionality Installation
Installation is easy and described in the README. A few comments:
- Since RDKit can only be installed via conda I would remove the requirements.txt file
- I would add a second .yml defining a full environment for testing (so that the user can avoid searching for openmm installation instructions)
- Adding a dedicated "Installation" section to the README could be beneficial
Documentation Installation
See comments above.
Detailed installation instructions could be added to the main documentation as well (the website is already in place but only the API documentation is present). Automated Tests
Automated tests can be easily run with pytest and all pass. However, there is no test coverage displayed. Using pytest-cov I can see that some modules have rather low coverage (convenience_functions.py is at 41%). I think the test coverage metric should be added and improved for some modules.
(See also thesketh/pygen-structures#3 https://github.com/thesketh/pygen-structures/issues/3) Functionality Documentation
Documentation for the objects returned by different functions is often missing. Example Usage
There is a single usage example. I think more usage examples should be added.
It would be useful to describe the different input in more details within the documentation with the help of usage examples.
(See also thesketh/pygen-structures#2 https://github.com/thesketh/pygen-structures/issues/2; as noted there, the documentation skeleton is already in place and should be relativelystraightforward to add an "Usage Examples" section) Misc
I noticed that the CHARMM parameter files are included under pygen_structures/toppar. Is there an associated license for such files?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/2157?email_source=notifications&email_token=ACGSGBZHAHOCB5EP5JVL4E3RG2565A5CNFSM4LFDODLKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEONOIBA#issuecomment-597353476, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB4HGDUH3T6QDFDL2OTRG2565ANCNFSM4LFDODLA .
Thanks for the preliminary comments, Rocco. I'll address these as soon as I can. I think I can fix the poor coverage of convenience_functions
by comparing code_to_mol
and sequence_to_mol
with a manually created Molecule
.
The requirements.txt
file is necessary for PyPI distribution. I could remove RDKit from the requirements and catch the ImportError
at runtime to produce a more helpful error, but I thought this was a dirtier solution.
The toppar
distribution is in the public domain, but I'll add another README
in this folder explaining where the files are from, where new versions can be obtained, and what changes have been made from the standard distribution (I've noticed some issues with the D-amino acid parameters, which are fixed in the 0.2.3 branch).
Documentation for return types is missing in some cases because from __future__ import annotations
is unavailable in Python 3.6, but I'll add :rtype:
tags where this is the case.
Thanks for clarifying the situation with toppar
; the lack of license on their end is a bit strange... In the light of the forum discussion you linked, what you suggest to do sounds very reasonable.
I'm not well versed with PyPI distribution, but I have a package on PyPI without any requirements.txt
. I think PyPI deployment only uses install_requires
in setup.py
(see here):
install_requires
should be used to specify what dependencies a project minimally needs to run. When the project is installed by pip, this is the specification that is used to install its dependencies.
If I pip
-install pygen_structures
on a clean environment, the package is installed but neither numpy
nor rdkit
are (since install_requires
is not specified). I think you can remove requirements.txt
, add numpy
to install_requires
and leave rdkit
installation to the user as described in the documentation (I think the error ModuleNotFoundError: No module named 'rdkit'
is clear enough). Other reviewers might have different opinions and are probably more experienced with PyPI deployment, so I'd wait for their comments before making any change.
Thanks for the tip about the install_requires
, it seems that there's even a page on "install_requires vs requirements files". I was under the impression that requirements.txt
was necessary. I think you're probably right and that deleting it is the right thing to do, but I'll hold off for now.
I think I've addressed most of your initial comments now (with the extra information added to README), but I still need to add the information to the main documentation.
Dear authors and reviewers
We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.
Thanks in advance for your understanding.
Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team.
Thank you for your comment, @arfon . It is refreshing to see a journal which cares about its authors, editorial staff, and reviewers.
In Scotland, we are relatively unaffected (as of yet) but I implore the editor and the reviewers to take the time they need in light of the current crisis. This submission can wait, your health can not. If you have comments to make, they are appreciated, but please focus on your health first and foremost.
I am still making my way through the code, but I had another thought about the handling of the additional testing dependency. I agree that avoiding having users search for openmm installation instructions would be good. @RMeli provided this solution, which was nice!
* I would add a second `.yml` defining a full environment for testing (so that the user can avoid searching for `openmm` installation instructions)
Another way that this can be done is to define these as extras in the setup.py as described here. I just wanted to mention this since this would avoid an extra file and could then be installed from pypi as described here. What do you think, @RMeli? I haven't dealt with this first hand so I'm not sure if your approach has a benefit I am overlooking.
@amandadumi I think that would be a good idea, but it seems to me that in order to install OpenMM conda
is needed (with the -c omnia -c conda-forge
channels).
Iirc there was an effort to get a minimal openmm working on conda-forge... the hang up was cuda dependencies
On Sun, Mar 15, 2020 at 20:03, Rocco Meli notifications@github.com wrote:
@amandadumi https://github.com/amandadumi I think that would be a good idea, but it seems to me that in order to install OpenMM http://docs.openmm.org/latest/userguide/application.html#installing-openmm conda is needed (with the -c omnia -c conda-forge channels).
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/2157#issuecomment-599257354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB4X4PQU3WJD7KE5PLDRHUYCDANCNFSM4LFDODLA .
By @jaimergp
On Sun, Mar 15, 2020 at 20:06, Richard Gowers richardjgowers@gmail.com wrote:
Iirc there was an effort to get a minimal openmm working on conda-forge... the hang up was cuda dependencies
On Sun, Mar 15, 2020 at 20:03, Rocco Meli notifications@github.com wrote:
@amandadumi https://github.com/amandadumi I think that would be a good idea, but it seems to me that in order to install OpenMM http://docs.openmm.org/latest/userguide/application.html#installing-openmm conda is needed (with the -c omnia -c conda-forge channels).
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/2157#issuecomment-599257354, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB4X4PQU3WJD7KE5PLDRHUYCDANCNFSM4LFDODLA .
@thesketh I didn't have enough time to dig deep into the code, but I have some comments about the paper below.
A big chunk of Fig. 1 details "planned future work". This gives the impression that pygen-structures
is incomplete and there is still a lot of work to do. I'm wondering if it would be better to focus on what the program does instead of outlining a lot of planned things that could be far away in the future. Generating coordinates for missing residues is a big project in itself and you already underline this limitation in the conclusions. @richardjgowers might have some advice on this.
I'm not sure about the term "arbitrary molecules", since you can only generate molecules from CHARMM residues and patches.
I think the problems with current workflows are very well illustrated. Despite only two packages are discussed (RDKit and MarvinSketch), I think similar problems will arise with other packages.
I think it's great that you outline the limitation of pygen-structures
compared to psfgen
and possible big additions to the functionality of the software. However, I think that the documentation of the current features should be complete and not left as future work.
psfgen
is inconsistently written in italicHi! Yes, we have a functional openmm
recipe on conda-forge
, but still not publicly available (it's on the testing
label). We have to sort out how to deal with potentially unsupported OpenCL implementations (check https://github.com/conda-forge/cfep/issues/17). In the meantime, the omnia
channel is needed, but as you can see in the issue this is subject to change in the near-term future.
@whedon generate pdf
Thank you both for your comments! I'm aware of the attempts to get OpenMM onto conda-forge, but I use OpenMM quite extensively in my research so I'm following this as well.
@amandadumi I've addressed this now in the installation instructions (and fixed the issue you opened about the RDKit installation instructions), but sadly I don't think there's an easy way to get around using conda for the dependencies. It's a shame there isn't a tie-in between pip and conda.
@RMeli I think I've fixed some of the issues you noted with the paper now.
The "Planned future work" was intended to note that there's another workflow (working with large molecules from the PDB) which currently isn't supported. I've updated the figure to state "Other software" instead. For perfectly formed PDB structures, filling missing hydrogen atoms (and even missing heavy atoms if the surrounding atoms are present) should in principle be easy, but the RDKit currently gives a segmentation fault on larger protein structures (I tried HEWL without success). I'm going to investigate using OpenBabel as a backend as well to address this, but I think it's outside the scope of this paper.
'arbitrary' has been removed.
I'd like to fit the contents of the PDB inside the page margins as well, would it be best to use a line continuation character such as \ and elipses on the following line? This is the main reason why I haven't updated the PSF examples to the current format (which uses the EXT format) - they're much wider.
I've noted similar behaviour from Avogadro (correctly identifies residue, but calls both the L and D forms the L-amino name) and GaussView (behaves similarly to MarvinSketch), but have omitted these examples to avoid making the paper too long.
Yes, you're right of course. I've updated this section to reflect this. I still think that further documentation of the polymeric structures included in CHARMM would be useful (e.g. is it possible to make triglycerides from glycerol/fatty acids? Are there glycolipid linkages defined?) but I think this may be better placed in the CHARMM documentation - CHARMM-GUI may already have some information about this.
These have been updated, I've added a citation for PyTest as well.
I'm a postgraduate student funded by the University of Strathclyde, but the work on this project has been done in my own time so I don't think that this is applicable here.
Is there a particular YAML field that this extra information should go under? The example papers only contain university name and department name.
Regarding italicisation, I've italicised all software packages on first invocation, but not afterwards. I couldn't see any guidelines about how best to format this. I can italicise all references to software packages if that would be more consistent
It would help if I'd pushed the changes to the flowchart.
@whedon generate pdf
@thesketh and @Rmeli, this is clear now. I do not use openmm and was didn't realize the openmm pypi project was simply for a user-friendly interface.
I was also able to go through the paper and had some general comments, many of mine echo what had previously been said, but I had a few specifics that applied to the most recent article proof. I don't think the call to whedon to include the updated flowchart worked though.
Software paper
A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
pygen-structures
can do is referred to as current work, which is fine. However, when current workflows is used in the section title 'Problems with current workflows'. Additionally, this sections main takeaway isn't the problems of others, but more so is highlighting where pygen-structures
out-performs previous workflow. The section is really demonstrative and beneficial, but the section naming is a confusing. Some suggestions of section title that reflects the contents and is avoid the confusing use of 'current' is 'Demonstrating the benefit of pygen-structures
' or simply 'Comparison of workflows'. State of the field: Do the authors describe how this software compares to other commonly-used packages?
Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
Small recommendations:
pygen-structures
is an open source (3-clause BSD license) Python library to generating 3 dimensional structures for molecules from the CHARMM forcefield based on empirical data (Riniker & Landrum (2015), as implemented in the RDKit, (“RDKit,”2020)).References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)?
@whedon generate pdf
@amandadumi - the Whedon calls need to be in their own comment ☝️
Hi @thesketh, here are my review notes. Great work on this submission; I believe this to be a good fit for JOSS. I am satisfied that much of the criteria for submission are met, and so the notes below focus heavily on items that I think should be addressed yet.
@whedon generate pdf
@whedon generate pdf
Thank you both for your comments!
@amandadumi I think I've addressed most of these now. I haven't changed the sentence containing "the RDKit" as this is how the package is referred to in its documentation.. I read it as "the Rational Discovery Kit" in my head.
Yes, in the past I've encountered similar behaviour from different 2D and 3D chemical drawing packages. I don't have access to a Windows/Mac computer to check how ChemDraw behaves, and I haven't tried to use PyMol before, but I'll try that and add the citations for these packages and Avogadro.
@dotsdl I submitted the paper at this stage because I believe the introduced functionality - in particular the ability to generate annotated structures for polysaccharides, lipids or protein-sugar linkages - is unique. Other additions would increase usability, but are not limiting progress towards an automated workflow. Much of the planned functionality is "would be nice to have", but is covered by existing software packages (e.g. Modeller to assign positions for missing residues, psfgen to handle protein structures from the protein databank and fill missing atoms, VMD's solvate plugin to handle creation of simulation boxes, and the martinize.py script to coarse grain protein structures for the MARTINI forcefield).
The project has been on GitHub for around a month, but I've been working on it piecemeal for a few months. I am still the only contributor so far, but I'd like to encourage further contribution. You are right that the package is tied in to the CHARMM ecosystem, do you think I should make this more clear?
I'm working on addressing the lack of documentation for library usage, and I've updated the paper to be more explicit about the requirement for structures to be small.
@thesketh ha! I have never noticed that in RDKit documentation! Thanks for explaining, that makes sense then.
@thesketh, your README is very well done and clear. I am also very happy to see the use of typing in your code! I found it very useful. Additionally, I found that your function commenting was thorough!
Two small issues:
[ ] Echoing the point made above, I would like clearer examples of using this package as a Python library, especially since the capability is stated in the README. Which relates to issue thesketh/pygen-structures/issues/2. I realize you're working on this, but I wanted to explain why I am waiting to check that box is all.
[ ] I like the inclusion of module descriptions as a bird eye view of the modules purpose, but I find it strange these appear at the bottom of the documentation page. Is this according to a specific standard? Otherwise, the descriptions would be more helpful at the top of the specific module. Specifically:
charmm_containers.py
, line 44 don't has a " instead of '.@thesketh Thanks for addressing our comments on the paper (I think the limitation on the number of residues, outlined by @dotsdl, is still missing?). The only "issue" I have left with the paper is the overflow of the code with respect to the text margins. @richardjgowers will there be any editing after the review, @thesketh has to take care of it, or it is OK as it is?
@whedon generate pdf
@whedon generate pdf
Thank you both for your comments.
@amandadumi Well spotted with " instead of '! That will teach me not to use find and replace to fix inconsistent string quotes. As for the module descriptions appearing at the bottom, I'm unsure why they're in the wrong place but I'll try to adjust it. These are autogenerated from the code comments, so should be fixed when I add the documentation for use as a library function. I've added citations for the other drawing packages as you requested.
@RMeli I've added an explicit reference to the maximum number of residues, and I fixed the long lines in the code examples by putting in line continuation characters (and elipses on the following line). I think this should be clear enough, but I can update this if you think there's a better way to show that the lines are wrapped or if you think it should be mentioned somewhere in the text.
I think I've now made all the recommended changes to the paper itself, I'll try to get the library documentation posted early this week.
@thesketh All you will need to do is to go into each of the pages content description within the doc/source/ folder. The order of the content for each page is dictated by these files. Moving the "Module Contents" section to the top should do the trick!
Thanks for the tip, @amandadumi! I ended up re-generating the RST files using sphinx-apidoc
, the -M
flag put them in the correct order.
@thesketh the documentation is looking great now, well done!
A few comments below:
--name and --segid control the names given in the COMPND record and the segment ID (4 character max) respectively.
When I try to use a --segid
longer than 4 characters, the code executes normally without warning. I think this is ok (since user should know this limitation), but maybe it is worth adding a warning that the name will be truncated just in case?
Under the Library Usage - Basic Usage section there is an error in the 4th code block:
patches = {'RAFF', [0, 1, 2]} # Should be a dictionary: {'RAFF': [0, 1, 2]}
I get True
here:
>>> mol = ps.code_to_mol('AdP', rtf)
>>> # This will pass verification in 0.2.3
>>> mol.check_parameters(prm)
False
I tried to see which version of pygen_structures
I installed by printing ps.__version__
(I installed 0.2.3
of course, as hinted by the comment) but AttributeError: module 'pygen_structures' has no attribute '__version__'
; I think it's worth adding this attribute to the package.
Thanks for your comments, @RMeli. I've added a warning for segids greater than four characters on the Molecule class now.
Well spotted on the wrong type in the doccs - I fixed it in the example before but missed it in that one!
The version string was in pygen_structures.version.version
before, but I've added it to pygen_structures.__version__
as well.
@whedon generate pdf
@richardjgowers I've released another version (v0.2.4) on PyPI to reflect @RMeli's recommended changes. I've updated the examples in the article to reflect the new version, could you update the version with whedon? I believe it's an editor-only command
@whedon set v0.2.4 as version
OK. v0.2.4 is the version.
Submitting author: @thesketh (Travis Paul Hesketh) Repository: https://github.com/thesketh/pygen-structures Version: v0.2.4 Editor: @richardjgowers Reviewer: @dotsdl, @rmeli, @amandadumi Archive: 10.5281/zenodo.3739128
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@dotsdl & @rmeli & @amandadumi, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @richardjgowers know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @dotsdl
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @rmeli
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @amandadumi
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper