[REVIEW]: smot: a python package and CLI tool for contextual phylogenetic subsampling

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@arendsee@Bisaloo<!--end-editor-- Reviewers: @Chjulian, @marekborowiec Archive: 10.5281/zenodo.7458325

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/243645cbbc5074c24e30edb8af982673"><img src="https://joss.theoj.org/papers/243645cbbc5074c24e30edb8af982673/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/243645cbbc5074c24e30edb8af982673/status.svg)](https://joss.theoj.org/papers/243645cbbc5074c24e30edb8af982673)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Chjulian & @marekborowiec, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Bisaloo know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @Chjulian

📝 Checklist for @marekborowiec

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 years ago

Attempting PDF compilation. Reticulating splines etc...

editorialbot commented 2 years ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.29 s (59.4 files/s, 14854.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          10            520            558           2178
Markdown                         3            126              0            397
TeX                              1             25              0            289
make                             1              1             29             83
YAML                             1              2              4             35
INI                              1              1              0              2
-------------------------------------------------------------------------------
SUM:                            17            675            591           2984
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1371/journal.pone.0009490 is OK
- 10.1093/nar/gkw857 is OK
- 10.1128/mSphere.00275-16 is OK
- 10.1093/molbev/mst010 is OK
- 10.1126/science.1189132 is OK
- 10.1038/nature08182 is OK
- 10.1128/JVI.00459-15 is OK
- 10.1371/journal.pcbi.1002947 is OK
- 10.1126/science.1117727 is OK
- 10.22269/210921 is OK
- 10.1186/s12859-018-2164-8 is OK
- 10.1186/1756-0500-6-145 is OK
- 10.1101/2021.02.13.431075 is OK
- 10.1093/molbev/msz053 is OK
- 10.1101/2021.01.21.427647 is OK
- 10.1186/s12864-018-4620-2 is OK
- 10.1093/bioinformatics/btq228 is OK
- 10.1093/bioinformatics/btp163 is OK
- 10.1128/mra.00673-19 is OK
- 10.12688/f1000research.10446.1 is OK
- 10.1093/molbev/msz053 is OK
- 10.1093/molbev/msw046 is OK
- 10.1016/j.softx.2020.100436 is OK
- 10.1111/2041-210X.12628 is OK
- 10.1093/molbev/msw046 is OK

MISSING DOIs

- None

INVALID DOIs

- 10.1016/s0168-95250300112-4 is INVALID

editorialbot commented 2 years ago

Wordcount for paper.md is 2339

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Bisaloo commented 2 years ago

:wave: :wave: :wave: @arendsee @Chjulian @marekborowiec this is the review thread for the paper. All of our communications will happen here from now on.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#4193 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We usually aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

If you have reviewed for JOSS in the past, please be aware that our workflow has slightly changed. You can now generate your review checklist by posting a comment with the following content:

@editorialbot generate my checklist

Please feel free to ping me (@bisaloo) if you have any questions/concerns.

Chjulian commented 2 years ago

Review checklist for @Chjulian

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/flu-crew/smot?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@arendsee) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Chjulian commented 2 years ago

smot (Simple Manipulation Of Trees) helps colour, annotate, and subset a tree (or extract a subtree rather than sampling a tree). Still, it is unclear what the authors mean by summarising, in phylogenetics, often means to map multiple trees into a single one.

I think that smot could be helpful to reduce the complexity of a tree to help unveil and communicate evolutionary features.

Documentation

General comment: Overall I think the -h and README.rd are not exhaustive or detailed enough. Given that this package deals with graphical tree-like structures, it requires more visual examples. The users should clearly understand what a command is intended for, as this is hard to tell from plain text. The provided examples are difficult to interpret because they are large phylogenies (Also, note that there are no details on what is happening in example 2 with each instruction). A smaller model, i.e. a tree with a dozen of taxa and the display of the most varied tree manipulations and key features (for example displaying the labels on the tree with each of the factoring schemes, displaying examples for each of the three ways of sampling, displaying the difference on using pull versus push when colouring, etc.) would give us an accurate idea of smot’s utility.

Specific comments: README.md. I found it puzzling to list the subcommands rather than the commands with the corresponding subcommands/options. I also think that the distinction between a command, subcommand and option is not consistent across the help.

The installation worked for me, but post-installation, the package required two dependencies that were not flagged as required during the installation.

ModuleNotFoundError: No module named 'parsec'
ModuleNotFoundError: No module named 'click’

smot does not respond when I do not specify the [TREE]. Ideally, the user should receive an error or a warning.

The help informs that the command ‘stat’ should ‘Display statistics for an input tree’, but it seems to return only the number of tip labels.

The command grep -f works when the grep expression is saved without quotation marks. I’m not sure if this is standard practice outside the command prompt, but it is worth checking.

Concerning the command sample: I find it confusing that the options equal, mono and para are listed as commands. There is no further guidance on how to use them. smot mono -h returns the Error: No such command 'mono'. Three factoring schemes appear in the descriptive text but are not listed as options, and I did not find documentation. The examples of sample written to the README.md use the options prop, min-tips, keep and seed. I did not find documentation of these options. Using prop triggers the Error: No such command 'prop'.

I think that some commands need to be chained in a particular order. This is not clear from the help or the README.md.

More details are needed for the command filter. For instance, it is not clear how the input should look like for the option --factor-by-field.

The help for color is not comprehensive. I had to look up the README.rd to work things out. The description informs that the colouring options are highly opinionated. I’m unsure what this means, but I guess it means that there are no default palettes to sample colours from and that the user must indicate each colour one by one. The options of color are listed as commands. The example in the README.md uses smot color leaf -P -p but the options -P and -p are not documented in the help of color. The help prints, There is no simple way to colour a particular branch (and why would you want to do that anyway). So, follow my tree coloring dogma and everything will be fine. I kindly disagree as sometimes a user may need to colour a single tip because it is a reference, the target sequence, etc.

Software paper

General comments: Some aspects of the methodology are not apparent because plain text description of tree-like structures is often not enough. Thus, some additional illustrations may be needed.

Statement of need. I am not sure all the info in the second paragraph fully achieves what it is intended to achieve (note that some of the points raised in this section are paraphrased (or missing) in the next section). Here, I understand that the authors must provide more details on the missing niche(s) that smot intends to fill. For instance, to inform if there is no software available to extract a subtree based on grep or regex and communicate why this could be beneficial (less time consuming, less error-prone, easy to include in near real-time pipelines analysis of sequence data, help to identify features from the tree, help communication of findings, etc.).

Specific comments: I prefer not to use the term “(reference) strain” here because the meaning of strain changes across disciplines and taxonomic groups. I suggest using reference taxa/taxon as it is precise in phylogenetics.

Line 15. genetic (or molecular as the authors use elsewhere in the text) sequence dataset

Line 23. variables instead of parameters?

Line 25. It is not fully clear to me what the authors are implying here with classification because the authors mention, for instance, that geographical location may be incomplete and requires classification. By classification do you mean annotation? that is, the metadata is available but is not represented on the tree. Otherwise, I don’t see how you can make such attributions based on phylogenetic associations alone. One can attribute unknown taxa to a particular taxonomic group because we classify biological entities based on evolutionary relationships (and I think this is implied on lines 34 and 49).

Line 26. I’m unsure about correlation being the correct term here, as correlation is not the only type of relationship that phylogeneticists are looking for. maybe something along the lines of “subsetting and colouring facilitates the communication and interpretation of phylogenies and traits.”

Line 33 r is not defined. Not clear how the scaling works here. Do you mean to reduce the size of each monophyletic group by retaining (if possible) a user-defined number of tips?

Line 35. I am not sure about the use of ‘ingest’. Maybe ‘take’ or ‘process’?

Line 36. I think the authors mean, “remove any monophyletic groups that do not meet certain criteria (lack or a variable value) or size”.

Line 50. What do you mean by purity? Groups are either monophyletic or not.

Line 68. Core algorithm. The paraphyletic versus Monophyletic algorithm is not fully clear to me. I think illustrations would be of great help here. This is important to understand line 127 too.

Plots in figure 1 are mislabelled. I think D corresponds to (missing) C and E corresponds to D.

Line 53. It is unclear what the authors explain here and what they call a hybrid approach. I understand that, given a phylogeny (built with reference taxa), smot attributes the unlabeled taxa to any taxonomic reference groups.

Line 56. Details are not provided about filtering. Also not clear what exactly smot does here concerning colouring. I understand that it creates a nexus output that can be read using FigTree?

Line 126. What is the meaning of n?

In line 119, by “all swine clades with a single member”, do you mean singletons?

Line 125. To understand these more complex trees and the implications of the methods, I need more info on the paraphyletic versus Monophyletic algorithm (schematics).

arendsee commented 2 years ago

Thanks @Chjulian for the review, I'll go through it more carefully this weekend.

arendsee commented 2 years ago

@Chjulian I'm still not quite done with the response, but it is high on my TODO list. I should be able to find a block of time soon.

Bisaloo commented 2 years ago

Hi @marekborowiec :wave:, friendly reminder that your feedback on this piece of software and related paper would be appreciated.

I understand if you are busy so feel free to let me know if you need more time. This message is mainly to keep this on your radar :relaxed:

Bisaloo commented 2 years ago

Hi @marekborowiec :wave:, are you still available to review this piece of software and related paper?

Please let me know if you need more time or if you don't think you will be able to make it :relaxed:

marekborowiec commented 2 years ago

Review checklist for @marekborowiec

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/flu-crew/smot?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@arendsee) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[ ] Installation: Does installation proceed as outlined in the documentation?
[ ] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[ ] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

marekborowiec commented 2 years ago

I was not able to install using pip. The README file states the requirements are biopython, parsec, and docopt. The requirements.txt has parsec and click but these are not installed when using pip. I was able to at least run smot in a clean environment after installing biopython, parsec, and click first. In addition to the requirements being consistent and installed along with smot, the documentation could link to each dependency repository. In short, installation requirements need to be improved.

smot has the potential to be a fast and useful tool but I agree with @Chjulian that documentation needs to be better thought-out and improved before this software can be used by others. It needs to state somewhere which tree file formats are supported. Hard to get started without that. I work with phylogenetic trees every day and yet I am having difficulty understanding much of this program's functionality. Examples isolated to individual commands with smaller trees would go a long way.

Going over the individual commands, starting with color . using smot color -h I lean that "coloring options are highly opinionated" and "follow my tree coloring dogma and everything will be fine". Alright, but there is no explanation of how the coloring algorithm works, leaving the user in the dark. I tried to run the command with smot color leaf mytree.nexus > colored.tre. According to the documentation, this should have produced a tree with colored tips. I see no modification to the tree when printed to the screen. When redirected and opened using FigTree I also did not see any coloring. Using smot color branch mytree.nexus > colored.tre did not work at all and it wasn't until I ran smot color branch -h I learned that I may provide a color map (but there are no instructions on how to do this) and that if I don't, smot will automatically apply a color-blind friendly palette. As far as I can tell, however, no modification is done to my tree using smot color branch mono mytree.nexus.

The commands factor and sample seem to be important and potentially very useful but I cannot understand how they work after looking at the examples provided and reading through in-program documentation. Small examples of both trees and table inputs will be helpful in explaining here.

I like the grep, tips, and tipsed commands, which are simple and seem to work as advertised. Very useful.

stat seems to only provide the number of tips and it is not clear that this is what it does until one runs it. I would have expected things like tree length, average branch lengths, etc.

As for the paper, I would say provide an exhaustive list of program capabilities and compare it with other available software. You say that smot "complements the existing Python phylogenetics ecosystem through its subsampling, classifying, and filtering algorithms" but what does that mean, exactly? Are DendroPy or TreeSwift not able to do any of these actions? What can they do instead?

I think these are the main issues I can identify right now. I hope the authors find my comments useful and I commend them to submitting to JOSS with the intent of improving this promising piece of software.

marekborowiec commented 2 years ago

@Bisaloo my review is complete. Thank you for your patience.

arendsee commented 2 years ago

I would like to thank @Chjulian and @marekborowiec for their thoughtful comments. Sorry for the delay, I've been transitioning into a new position and a new city, so am a bit back-logged.

Below are my responses to the main points the raised with respect to the smot code. I will address the comments on the paper next. All the changes I mention below are in smot v0.17.4.

It is not clear what the author meant by "summarizing" the trees

Good point, I have removed the phrase from the README and program description. When I wrote that description, I had more ambitious plans for the smot stat function for returning tree statistics and summaries.

The `smof stat` function just prints the number of tips

Right, that command was a stub. As mentioned above, I had intended to make it more sophisticated. Instead I will just remove it. If you want to know the number of tips in a tree, you can just call smot tips x.tre | wc -l.

Some options or commands are not documented

The reviewers misunderstood the nested documentation paradigm of smot. This is a problem, of course, because if they misunderstand others will as well. For this reason, I have added a help statement at each internal documentation level that gives an example of how to access subcommand help. For example:

$ smot color -h
Usage: smot color [OPTIONS] COMMAND [ARGS]...

  Color the tips or branches.

  The coloring philosophy in smot is to encode information about the specific
  taxon in the taxon label color and to encode data about groups of taxa in
  the branch color.

  If you want to color tips by phylogenetic group, you may first color by
  branch and then push the colors to the tips:

    smot color branch mono --factor-by-field 1 -c colormap foo.tre |     smot
    push

  This will identify all monophyletic groups based on the clade name found in
  the first position of the taxon label (e.g., `>primate|...`. The
  monophyletic branches are colored based a color map, which should have an
  entry such as `primate #00FF00`. Then `smot push` takes colors in the nodes
  and pushes them to their children, thus pushing the parent node colors to
  the taxon labels.

  Conversely, taxon label colors can be pulled up into parent nodes using
  `smot color pull`.

Options:
  -h, --help  Show this message and exit.

Commands:
  branch  Color the branches of a tree.
  leaf    Color the taxa labels on a tree.
  pull    Pull colors from tips to nodes
  push    Push colors from nodes to tips
  rm      Remove all color annotations from a tree

  For subcommand usage, append `-h` (e.g., smot color branch -h)

The last line of the help statement gives can example of how to access subcommand help. Following that line:

$ smot color branch -h
Usage: smot color branch [OPTIONS] COMMAND [ARGS]...

  Color the branches of a tree. You may provide a color map; if you do not,
  smot will automatically map factors to colors from a color-blind friendly
  palette.

Options:
  -h, --help  Show this message and exit.

Commands:
  mono  Color a tree by monophyletic factor.
  para  Color a tree by paraphyletic factor.

  For subcommand usage, append `-h` (e.g., smot color branch mono -h)

This takes us to a terminal command:

$ smot color branch mono -h
Usage: smot color branch mono [OPTIONS] [TREE]

  Color a tree by monophyletic factor.

  smot color branch mono --factor-by-capture="(1B\.[^|]*)" 1B.tre

Options:
  --factor-by-table ?STR    Read factors from a TAB-delimited file (tip labels
                            in the first column and factors in the second
                            column)
  --factor-by-field ?NAT    Factor by 1-based field index (with '|'
                            delimiters, for now)
  --factor-by-capture ?STR  A regular expression with a capture for
                            determining factors from labels
  -c, --colormap PATH       A TAB-delimited, headless table with columns for
                            factor names and hexadecimal colors
  -h, --help                Show this message and exit.

Each level of the documentation shares what is common to all subcommands. At the terminal level of documentation, the options are listed. This approach to documentation avoids the long dump of often conflicing arguments that is common in command line tools (e.g., blast).

The documentation is not extensive enough (especially for the color command)

Very true, I've gone through all the internal documentation in the package and clarified it. The color documentation I have completely rewritten. I have added examples to most of the commands.

Pip installation required manually installing parsec and click

Fixed in v0.17.4

The command grep -f works when the grep expression is saved without quotation marks.

It is standard practice. Specifically, UNIX grep, which `smot grep` is
modeled on, works without quotes. Quotes are only necessary when the
pattern contains spaces or characters that have meaning in the shell (such
as '>' or '\').

The program hangs when no file is given

This is because it is waiting for input from STDIN. UNIX grep and most other tools that can read from STDIN do the same. If you really wanted to, you could type a tree into terminal. Just for fun, run smot tips, type (A,B); and then hit Ctrl-D.

I think that hits the points raised about the program itself. If you have any additional comments, I'm happy to address them. I'll get to the paper as soon as I can.

Bisaloo commented 2 years ago

The reviewers misunderstood the nested documentation paradigm of smot. This is a problem, of course, because if they misunderstand others will as well.

:+1:

This might also be the sign that you should use another (additional) framework for the documentation. It's nowadays pretty common to have a documentation website (typically readthedocs for python packages but I'm sure they are plenty of alternatives) alongside the bundled documentation. A website might make it easier to navigate a nested/hierarchical structure. What do you think?

arendsee commented 2 years ago

@Bisaloo Online documentation is nice for case studies and high-level descriptions where diagrams and figures are helpful, but I would always prefer quick CLI usage statements if I need to know what flags are available. The smot CLI documentation is easy and minimal; I just hadn't explained it.

Bisaloo commented 2 years ago

I'll let the reviewers weigh in but I'm not sure explaining it will be enough to make it more easily navigable. It'll always be difficult to have a general overview of all the possible options, without going inside each one of the items.

Note that I'm not arguing you should remove the CLI documentation. I'm recommending you add an alternative, which some users may find easier to understand.

Bisaloo commented 2 years ago

Hi @arendsee :wave:, it's not very clear to me what the current status of this submission is (if we leave aside our disagreement about documentation for now). Do you feel that you've addressed all other issues raised in the review and we should start another round?

Hi @Chjulian @marekborowiec :wave::wave:, could you take a moment to see if the changes in the documentation are helpful / enough? If not, do you have specific tips on how to improve it? Please also check our discussion in the comments right above and feel free to weigh in either way (do you think a documentation website is helpful or unnecessary overhead?).

marekborowiec commented 2 years ago

Hi @Bisaloo. There are still issues with installation. The README says it needs Python v3.6 or newer but I tried with a 3.6.2 installation and also a clean conda env with 3.6.15 and no luck, there is an issue with from __future__ import annotations. This statement won't work unless you have Python >3.7 and so the documentation is confusing on that point. Also, the documentation states requirements are Biopython, parsec, and docopt but installing with pip in a clean environment installs parsec, zipp, typing-extensions, importlib-metadata, click, smot. I haven't tried all the functionality of the new version, but will it be compromised without Biopython?

Again, there needs to be a discussion of what tree formats smot can take. Several of the individual, nested help messages now say that they will only work with "nexus", for example, but that should be discussed in README/manuscript.

The nested nature of the documentation was pretty clear once I played around with the program. However, the CLI help should state how to us it up front. It seems this is improved now with added statements about nested help messages. My main criticism wasn't related to the layout of command help, just the fact that it is very difficult to understand how to use the many commands and subcommands and options without examples. The documentation is now improved now and it explains more with examples. At the same time, it would still be good to have a birds-eye overview of functionalities of the program so that potential users can see what it can do. As I already mentioned, the examples in README and manuscript are too complex and they obscure the functionality of the program. This could be done in a much more expanded README or, as @Bisaloo suggested, readthedocs or a similar platform. Providing something with smaller examples of each function would increase the program's visibility and user base, so it is in your best interest to work this out. You cannot expect every potential user to first install your program without having a good idea of what it can do, then go through the massive tree of help messages to figure out what they can do with it. No one does that, and so you can expect fewer users/citations for your program.

Issues in the manuscript have not been addressed yet.

Small point, throughout the documentation "leafs" should be "leaves".

marekborowiec commented 2 years ago

Hi @arendsee please see my comments above.

arendsee commented 2 years ago

@marekborowiec I've removed the README comments on requirements; they were out of date. The package does require Python 3.7.

As for formats, I've updated the README with a description of the formatting rationale. Input is Newick or Nexus and output is Nexus unless a --newick flag is set.

While it may be a good idea to make a dedicated documentation page, I don't have the time to do it. The examples in the README give an idea of what the program does and finding specific documentation only requires going one or two steps into the tree (there is no massive tree search needed).

I've gone through most of the comments in the paper. I'll double-check. Sorry for the slow responses lately.

arendsee commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Bisaloo commented 2 years ago

Hi @arendsee, I have opened two issues summarising what seem to be the current blockers to get things moving.

Once this is solved, I can ask the reviewers to have another look at their checklists and see what's left to get this submission through the door.

📝 Checklist for Chjulian 📝 Checklist for marekborowiec

I understand you are particularly busy in this moment but could you provide us with a tentative timeline for these changes please?

arendsee commented 2 years ago

Thanks @Bisaloo, I saw your issues and can address them next week. I guess I can go ahead and write up a dedicated docs page. It isn't too much work. I probably just flatten the internal documentation and include output images of the trees.

arendsee commented 2 years ago

@Bisaloo I went with adding more examples to the README instead of building a whole readthedocs site. The new examples give an overview of what can be done with smot. All the data needed to run the examples is present in the test-data folder, so anyone interested can explore the results using their choice of visualization software.

Bisaloo commented 2 years ago

Thanks for the changes @arendsee, I believe this is a huge improvement and it will surely convince more potential users to give your tool a go.

@marekborowiec @Chjulian, could you have another go at reviewing this submission and update your respective checklists accordingly please? Thanks a lot for your help!

📝 Checklist for Chjulian 📝 Checklist for marekborowiec

marekborowiec commented 2 years ago

@Bisaloo Super slammed but I will try to take a look next week.

Chjulian commented 2 years ago

Dear all,

I tested smot on a macOS M1 with Monterey 12.6 I installed it without any warnings/errors. I ran and checked all the examples provided, and some are not working as intended (see below)

Errors: I used the pdm.tre (ntax=247) stored in smot/test-data/ and the outputs have 233, two, three and four tips. Those outputs do not match figure (A). I suspect this is the wrong tree, given the number of taxa.

There is no explanation behind Example 2. The first two lines from example 2 return the following errors respectively: Error: No such option: --seed Did you mean --keep? Error: No such command 'prop'.

smot sample mono --seed=24601 --factor-by-field=6 -p 0.25 1B.tre Returns the error: IndexError: list index out of range IndexError: Cannot access the 6th field in tip label 1.000

The nested documentation is now evident in my head. The updated readme is more useful. However, as a returning user, I still found it challenging to understand what some of the commands do. Users would have to run each example command and check the trees in another program to figure out what changed in the trees. Also, the changes are not always readily evident because the example trees have many taxa (IMO, trees with tons of taxa are good for evaluating performance but not for showing how to use the program). So, small examples could be added in the future to increase visibility.

arendsee commented 2 years ago

Thanks @Chjulian, I updated example 2. The example with smot sample mono shouldn't raise an error in the current version of smot. Previously smot died on failure to access a field by index in a taxa name. But that was a bit drastic, so now it will skip these cases.

arendsee commented 1 year ago

Hey @Bisaloo and @Chjulian, is everything good to go?

Bisaloo commented 1 year ago

@marekborowiec, if you want to add more comments, please do so by Monday 19. If nothing comes up by then, all looks good to me and I'll accept this submission.

Thanks @arendsee, @Chjulian & @marekborowiec for your hard work on this! Apologies for the slow process but I do believe it resulted in important improvements!

Bisaloo commented 1 year ago

@editorialbot generate pdf

Bisaloo commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1371/journal.pone.0009490 is OK
- 10.1093/nar/gkw857 is OK
- 10.1128/mSphere.00275-16 is OK
- 10.1093/molbev/mst010 is OK
- 10.1126/science.1189132 is OK
- 10.1038/nature08182 is OK
- 10.1128/JVI.00459-15 is OK
- 10.1371/journal.pcbi.1002947 is OK
- 10.1126/science.1117727 is OK
- 10.22269/210921 is OK
- 10.1016/s0168-9525(03)00112-4 is OK
- 10.1186/s12859-018-2164-8 is OK
- 10.1186/1756-0500-6-145 is OK
- 10.1101/2021.02.13.431075 is OK
- 10.1093/molbev/msz053 is OK
- 10.1101/2021.01.21.427647 is OK
- 10.1186/s12864-018-4620-2 is OK
- 10.1093/bioinformatics/btq228 is OK
- 10.1093/bioinformatics/btp163 is OK
- 10.1128/mra.00673-19 is OK
- 10.12688/f1000research.10446.1 is OK
- 10.1093/molbev/msz053 is OK
- 10.1093/molbev/msw046 is OK
- 10.1016/j.softx.2020.100436 is OK
- 10.1111/2041-210X.12628 is OK
- 10.1093/molbev/msw046 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

arendsee commented 1 year ago

@Bisaloo Awesome, thanks for all the help! @Chjulian @marekborowiec Thanks for all the comments and hard work you both put into testing. Happy holidays!

Bisaloo commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Bisaloo commented 1 year ago

@arendsee, all looks good to me. At this point could you:

[x] Make a tagged release of your software, and list the version tag of the archived version here.
[x] Archive the reviewed software in Zenodo or a similar service (e.g., figshare, an institutional repository)
[x] Check the archival deposit (e.g., in Zenodo) has the correct metadata. This includes the title (should match the paper title) and author list (make sure the list is correct and people who only made a small fix are not on it). You may also add the authors' ORCID.
[x] Please list the DOI of the archived version here.

I can then move forward with recommending acceptance of the submission.

marekborowiec commented 1 year ago

@Bisaloo @arendsee Apologies for the long silence. I was able to install smot now and I think that documentation examples are much better now. Smaller trees and even more bite-sized examples would be even better but I guess what is there now is sufficient to use this program. I haven't been able to find the time to play around as extensively as before, but I noticed some deficiencies in the smot color - color taxa labels or branches README section: the first two examples are not in code blocks. Also, the 1B.tre from the example already appears to be colored this way so running the code doesn't change anything. Otherwise I think it looks good!

arendsee commented 1 year ago

Thanks @marekborowiec , I've fixed the code blocks in the README. 1B.tre is a newick file without color in the current github version. Are you sure you didn't overwrite it locally?

You've put a lot of work into reviewing this package, I really appreciate it!

Bisaloo commented 1 year ago

I think the previous message was addressed to @marekborowiec

arendsee commented 1 year ago

@Bisaloo Here's the Zenodo DOI: 10.5281/zenodo.7458325

Bisaloo commented 1 year ago

@editorialbot set 10.5281/zenodo.7458325 as archive

editorialbot commented 1 year ago

Done! Archive is now 10.5281/zenodo.7458325

Bisaloo commented 1 year ago

@editorialbot set v1.0.0 as version

openjournals / joss-reviews