hcdenbakker / sepia

taxonomic classifier based on the kraken2 algorithms and more
GNU General Public License v3.0
45 stars 3 forks source link

JOSS Review #17

Closed telatin closed 2 years ago

telatin commented 3 years ago

https://github.com/openjournals/joss-reviews/issues/3839

Hello there, congrats for this package, I loved it.

Some minor comments:

1) Update LICENSE, removing the placeholders:

<program>  Copyright (C) <year>  <name of author>

2) A slightly expanded documentation would be beneficial, in particular documenting the file formats (input and output files). The readme is fantastic to get a "worked" example but a reference documentation on a separate md file might be a useful addition. Some examples:

3) Please, add a CONTRIBUTING.md briefly defining how to contribute to the project, maybe adding a link to a code of conduct.

4) Installation is easy, but adding the package to BioConda would be very beneficial for the bioinformaticians planning to use the tool in pipelines. Is this planned for later?

5) From the statement of need it looks like that the (highly appreciated) flexibility provided by Sepia in terms of database creation could have been achieved with tools helping to format reference sequences in a Kraken-compatible format (ad hoc NCBI taxonomy), without reimplementing the whole thing (unless I'm mistaken here). Under this light, it would be an added value for the reader to see a simple comparison of performance and sensitivity/specificity between Kraken2 and Sepia using a similar database.

hcdenbakker commented 2 years ago

Hi Andrea Telatin,

Thank you for your helpful comments and speedy review!

Here is our response to your comments: Some minor comments:

  1. Update LICENSE, removing the placeholders: • We updated the license and removed the placeholders
  2. A slightly expanded documentation would be beneficial, in particular documenting the file formats (input and output files). The readme is fantastic to get a "worked" example but a reference documentation on a separate md file might be a useful addition. • We are planning on making more expanded documentation including a section on how to build indices from (for example) the GTDB database. For now we addressed the examples that are mentioned Some examples: • what should be checked taxonomy_ambiguities.txt for (in the build subcommand)? We included an example of what is in the taxonomy_ambiguities.txt file with a couple of real-life examples • what is the format of the summary (the readme describe 3 columns, I found 4) We included an description of the fourth column of the summary file
  3. Please, add a CONTRIBUTING.md briefly defining how to contribute to the project, maybe adding a link to a code of conduct. • We included the requested CONTRIBUTING.md
  4. Installation is easy, but adding the package to BioConda would be very beneficial for the bioinformaticians planning to use the tool in pipelines. Is this planned for later? •We are planning on adding the package to bioconda
  5. From the statement of need it looks like that the (highly appreciated) flexibility provided by Sepia in terms of database creation could have been achieved with tools helping to format reference sequences in a Kraken-compatible format (ad hoc NCBI taxonomy), without reimplementing the whole thing (unless I'm mistaken here). Under this light, it would be an added value for the reader to see a simple comparison of performance and sensitivity/specificity between Kraken2 and Sepia using a similar database. •We added a few sentences summarizing the performance and sensitivity/specificity between Kraken2 and Sepia using a similar database.
telatin commented 2 years ago

Awesome, thanks!