[REVIEW]: RedOak: a reference-free and alignment-free structure for indexing a collection of similar genomes

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@cagret@lpantano<!--end-editor-- Reviewers: @swatimanekar, @samhorsfield96 Archive: 10.6084/m9.figshare.21711767

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/1d4506a2b9e02c0d4f2f547673ff25f1"><img src="https://joss.theoj.org/papers/1d4506a2b9e02c0d4f2f547673ff25f1/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/1d4506a2b9e02c0d4f2f547673ff25f1/status.svg)](https://joss.theoj.org/papers/1d4506a2b9e02c0d4f2f547673ff25f1)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@swatimanekar & @samhorsfield96, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @lpantano know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @swatimanekar

📝 Checklist for @samhorsfield96

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bib/bbw089 is OK
- 10.1111/pbi.12499 is OK
- 10.1093/gigascience/giy125 is OK
- 10.1093/nar/gkp492 is OK
- 10.1186/1471-2105-12-242 is OK
- 10.1093/bioinformatics/btr011 is OK
- 10.1137/070685531 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 2 years ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.93 s (88.2 files/s, 75266.0 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Bourne Shell                     10           4773           5619          29405
m4                                9           1104            323          10470
C++                              17            506           1409           4437
C/C++ Header                     15            604           4059           1872
TeX                              15            268            203           1240
Markdown                          5            499              0           1159
Bourne Again Shell                3            106            223            607
make                              5             98            433            267
Perl                              1             69            147             66
YAML                              2              1              1             41
--------------------------------------------------------------------------------
SUM:                             82           8028          12417          49564
--------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 2 years ago

Wordcount for paper.md is 1353

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

samhorsfield96 commented 2 years ago

Review checklist for @samhorsfield96

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://gite.lirmm.fr/doccy/RedOak?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@cagret) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

samhorsfield96 commented 2 years ago

@lpantano there was an issue generating the paper pdf. Is there a way of accessing the newest draft? I see there is one available on biorxiv, but I am unsure whether this is viable for review.

swatimanekar commented 2 years ago

configure: error: in /home/ankush/Downloads/RedOak-master': configure: error: No suitable C++ MPI implementation found. Seeconfig.log' for more details

I am getting the above message while configuring the tool

swatimanekar commented 2 years ago

Review checklist for @swatimanekar

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://gite.lirmm.fr/doccy/RedOak?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@cagret) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

cagret commented 2 years ago

Hi, you need : A modern, C++11 ready compiler such as g++ version 4.9 or higher or clang version 3.2 or higher. A 64-bit operating system. Either Mac OS X or Linux are currently supported. OpenMPI libGkArrays-MPI To proprely install the tool. RedOak uses the libGkArrays-MPI library, which requires the zlib and openMPI development files to be already installed on your system (on Debian/Ubuntu it corresponds to the packages zlib1g-dev and libopenmpi-dev). (sudo apt install zlib1g-dev && sudo apt install libopenmpi-dev).

samhorsfield96 commented 2 years ago

@cagret Is there a pdf version of the paper available? I saw one on biorxiv but I'm unsure if this is the latest version. However, the paper.md on the redoak repo seems incomplete in comparison.

cagret commented 2 years ago

@samhorsfield96 yes the one on biorxiv is up to date. I don't understand why the pdflatex on main.tex doesn't work. Can you please retry with sudo apt install textlive-full first and pdflatex main.tex ? Thank you, Regards, Clément

cagret commented 2 years ago

i updated the git repo : https://gite.lirmm.fr/doccy/RedOak/-/tree/master/RedOak-paper

samhorsfield96 commented 2 years ago

i updated the git repo : https://gite.lirmm.fr/doccy/RedOak/-/tree/master/RedOak-paper

Unfortunately the in-text referencing hasn't worked. I shall review the latest biorxiv version.

samhorsfield96 commented 2 years ago

@cagret Thank you for writing a great piece of software. The algorithm itself and the explanation of how it works is excellent, and the benchmarking is comprehensive and pushes Redoak to its limits. I have some minor comments below which hopefully explain my decisions on the checklist. Please let me know if anything is not clear.

Summary

Overall great idea for a tool. Theory is clearly explained, figures for understanding memory assignment and intersections of core/accessory k-mers are very useful.
Impressive results versus current k-mer counting tools.
When querying sequences, identification of preceding k-mer that overlaps a given position is very useful and a unique function I have not seen in any other kmer-based querying tool.

Repository and software

No explicit "Statement of need" in repository
Installation is very 'hands-on'. I was able to install and run Redoak on my linux machine, however it was reasonably laborious, and not likely achievable by someone with less command line experience. Implementation on conda/docker would solve this.
"README" requires example file formats for config and list input files. These are present in the "resources" directory, but they are not clearly highlighted. Therefore it is difficult to determine file formats required for input.
There is no explanation of the output format or how this could be parsed.
Repository does contain unit tests.
No automated tests (continuous integration). The example given for testing mpi installation does not run on my machine, with below errors:
- [P0] Error: Ignoring unknown argument '../test/test-0025_reads.fastq'
- Ignoring unknown argument '25'
I was unable to test using the same datasets as used in the article due to huge time and memory requirements. Instead, I used a Listeria monocytogenes dataset, consisting of 128 genomes from Kremer et al. (https://pubmed.ncbi.nlm.nih.gov/27998823/) with default parameters, and queried a wgMLST gene (https://bigsdb.pasteur.fr/cgi-bin/bigsdb/bigsdb.pl?db=pubmlst_listeria_seqdef&page=alleleInfo&locus=A118int&allele_id=6).
- K-mers from query were found in some genomes
- 22 core kmers, 14647729 variable kmers and 2764805 specific kmers identified. This is reasonable given large SNP distances within the species that would reduce k-mer matching at larger kmer values.
- Ran in 26:00 minutes using 1.7 Gb memory with 4 mpi threads on personal computer with intel core i7 processor and 16Gb memory.
Output of query results goes to STDOUT, would be better to a designated file for parsing and interpretation.

Paper (reviewed paper here: https://www.biorxiv.org/content/10.1101/2020.12.19.423583v2.full.pdf)

Paper does not contain a section titled "Statement of need"
Good comparisons against existing tools.
Theory behind Redoak implementation is clear.
7th paragraph of Results section, cloud kmer storage in memory is not mentioned. I am unsure whether part of this section is supposed to refer to cloud k-mers, but accidently refers to shell k-mers instead.
The section headings should be changed. Section titled "Results" should be changed to "Methods", and section titled "Discussion" should be titled "Results".
Subsections titled "Implementation" and "Value of k and k1" should be moved to previous section.
Subsection titled "Example of PAV analysis" would benefit from an accompanying figure. This could be an alignment of Pstol sequence to genome GP104, which was detected by Redoak but not Zhao et al. Moreover, this subsection should also include a sensitivity analysis compared to Zhao et al., detailing whether all genomes identified to contain the gene in Zhao et al. were also found by Redoak.
Figure 3 should be split into two panels for time and memory efficiency comparison, as it is currently difficult to read.
Formatting error at base of page 14, sentence starting with "the gene Pstol"

swatimanekar commented 2 years ago

Why am I getting number of indexed genome = 1 though I have specified three genomes. Command as follows, mpirun -n 9 redoak --genome GLA4_1000.fa,GP39_1000.fa,DHX2_1000.fa -k 27 -o output_sam.txt (File size (GLA4_1000.fa,GP39_1000.fa,DHX2_1000.fa) is shorten for testing) And output file is as follows (staring few lines)

GLA4_1000.fa,GP39_1000.fa,DHX2_1000.fa AAAAAAAAAAAAAAAATTAGCTCAGCC (1): 1 AAAAAAAAAAAAAAAATTCCCACGATG (1): 1 AAAAAAAAAAAAAAATTAGCTCAGCCT (1): 1 AAAAAAAAAAAAAAATTCCCACGATGA (1): 1 AAAAAAAAAAAAAATTAGCTCAGCCTA (1): 1 AAAAAAAAAAAAAATTCCCACGATGAT (1): 1 AAAAAAAAAAAAACCAAAAACATGAGA (1): 1 AAAAAAAAAAAAATTAGCTCAGCCTAG (1): 1

I have checked for all the lines in the output file, the 27-mers are different but ‘(1): 1’ this part is the same for all 27-mers. Can you help me with the understanding of output format?

screenshot as follows

redoak

cagret commented 2 years ago

Hello, maybe the doc isn't clear enought, and thanks you :

" "Examples:\v" " To index a single (compressed fasta) file without any extra information:\v" " --genome gen1.fa.gz\v" " To index several (compressed or not) files without any extra information:\v" " --genome gen2_chr1.fa,gen2_chr2.fa.gz,gen2_chr3.fa\v" " To index several files and provide a genome name (notice that the white" " spaces must be escaped or the whole argument must be enclosed within" " braquets):\v" " --genome My\ Third\ genome:gen3_chr1.fa,gen3_chr2.fa.gz\v" " To index several (fastq) files setting both a genome name and a genome" " version:\v" " --genome Unassembled\ genome:Version\ 0.1.2:sample1.fq,sample2.fq.gz\v" " To collect the sequence headers and length (be aware that it consumes a" " lot of time and memory):\v" " --genome ::+:sample5.fq.gz,sample6.fq.gz\v" " To index several files setting a genome name and a minimum number as well" " as a maximum number of occurrences for k-mers to be indexed (notice that" " the genome version is empty):\v" " --genome Another\ genome::[5,150]:sample3.fq,sample4.fq.gz\v" "

You need to use the option --genome for each genome. If you use the coma, redoak will think is it a same genome with several files.

In your exemple you need to to mpirun -n 9 redoak --genome GLA4_1000.fa --genome GP39_1000.fa --genome DHX2_1000.fa -k 27 -o output_sam.txt

Thank you for your feed back.

Clément

swatimanekar commented 2 years ago

Now it is working fine with "mpirun -n 9 redoak --genome GLA4_1000.fa --genome GP39_1000.fa --genome DHX2_1000.fa -k 27 -o output_sam.txt". It is mentioned in the manuscript that for JellyFish, a small modification of the merge tool implementation and also a program has been created that simulates a parallelization of jobs. If you could provide this programme, it would be much easier for me to compare RedOak and JellyFish.

cagret commented 2 years ago

Sure how can i send you the "hacked version" ?

cagret commented 2 years ago

Using the hacked version : step 1: count the k-mers of the files with the count subcommand (well, no need to do it again, this has not been modified). step 2 : use the merge subcommand with the jellyfish binary dumps as input. The output file will look very similar to the one produced by RedOak (but the k-mers are not ordered).

We just merge the jellyfish dumps and measure the time it takes (the memory consumed is very low). This gives us a reliable way to check the validity of the RedOak outputs. We delete the header lines (starting with #) and sort what is left.

Like "tail -n +9 merged-output-from-jf.csv | sort > sorted-merge-output.csv". (then you just have to do an md5sum or a diff)

Another way to do it, sort without deleting the rows starting with # then do a diff ignoring the rows starting with #.

cagret commented 2 years ago

diff -ru jellyfish-2.2.10.orig/include/jellyfish/text_dumper.hpp jellyfish-2.2.10/include/jellyfish/text_dumper.hpp
--- jellyfish-2.2.10.orig/include/jellyfish/text_dumper.hpp 2018-03-29 21:28:08.000000000 +0200
+++ jellyfish-2.2.10/include/jellyfish/text_dumper.hpp  2019-05-14 12:57:23.647366563 +0200
@@ -26,6 +26,15 @@
   void write(std::ostream& out, const Key& key, const Val val) {
    out << key << " " << val << "\n";
   }
+  void write(std::ostream& out, const Key& key, const Val val, const std::vector<bool> presence) {
+    out << key << " (" << val << "):";
+    for (std::vector<bool>::const_iterator it = presence.begin();
+         it != presence.end();
+         ++it) {
+      out << " " << *it;
+    }
+    out << "\n";    
+  }
 };

 template<typename storage_t>
@@ -35,6 +44,7 @@

 public:
   static const char* format;
+  static const char* mergeformat;

   text_dumper(int nb_threads, const char* file_prefix, file_header* header = 0) :
     super(nb_threads, file_prefix, header)
@@ -54,6 +64,8 @@
 };
 template<typename storage_t>
 const char* jellyfish::text_dumper<storage_t>::format = "text/sorted";
+template<typename storage_t>
+const char* jellyfish::text_dumper<storage_t>::mergeformat = "text/merged";

 template<typename Key, typename Val>
 class text_reader {
diff -ru jellyfish-2.2.10.orig/jellyfish/merge_files.cc jellyfish-2.2.10/jellyfish/merge_files.cc
--- jellyfish-2.2.10.orig/jellyfish/merge_files.cc  2018-03-29 21:28:08.000000000 +0200
+++ jellyfish-2.2.10/jellyfish/merge_files.cc   2019-05-14 13:00:17.719359425 +0200
@@ -50,12 +50,24 @@
 typedef std::unique_ptr<RectangularBinaryMatrix> matrix_ptr;

 template<typename reader_type, typename writer_type>
+void do_write(writer_type& writer, std::ostream &out, const mer_dna &key, uint64_t sum, const std::vector<bool>& presence) {
+  writer.write(out, key, sum);
+}
+
+template<>
+void do_write<binary_reader, text_writer>(text_writer& writer, std::ostream &out, const mer_dna &key, uint64_t sum, const std::vector<bool>& presence) {
+  writer.write(out, key, sum, presence);
+}
+
+template<typename reader_type, typename writer_type>
 void do_merge(cpp_array<file_info>& files, std::ostream& out, writer_type& writer,
               uint64_t min, uint64_t max) {
   cpp_array<reader_type> readers(files.size());
   typedef jellyfish::mer_heap::heap<mer_dna, reader_type> heap_type;
   typedef typename heap_type::const_item_t heap_item;
   heap_type heap(files.size());
+  std::vector<bool> presence;
+  presence.resize(files.size(), false);

   for(size_t i = 0; i < files.size(); ++i) {
     readers.init(i, files[i].is, &files[i].header);
@@ -66,17 +78,19 @@
   heap_item head = heap.head();
   mer_dna   key;
   while(heap.is_not_empty()) {
+    std::fill(presence.begin(), presence.end(), false);
     key = head->key_;
     uint64_t sum = 0;
     do {
       sum += head->val_;
+      presence[std::distance(&readers[0], head->it_)] = true;
       heap.pop();
       if(head->it_->next())
         heap.push(*head->it_);
       head = heap.head();
     } while(head->key_ == key && heap.is_not_empty());
     if(sum >= min && sum <= max)
-      writer.write(out, key, sum);
+      do_write<reader_type, writer_type>(writer, out, key, sum, presence);
   }
 }

@@ -133,6 +147,7 @@
   std::ofstream out(out_file);
   if(!out.good())
     throw MergeError(err::msg() << "Can't open out file '" << out_file << "'");
+  format = "text/merged";
   out_header.format(format);

   if(!format.compare(binary_dumper::format)) {
@@ -144,6 +159,24 @@
     out_header.write(out);
     text_writer writer;
     do_merge<text_reader, text_writer>(files, out, writer, min, max);
+  } else if(!format.compare(text_dumper::mergeformat)) {
+      out << "# hostname: '" << out_header["hostname"] << "'\n"
+          << "# Execution time: " << out_header["time"] << "\n"
+          << "# Working Directory: '" << out_header["pwd"] << "'\n"
+          << "# Program: '" << out_header["exe_path"] << "'\n"
+          << "# CmdLine: '";
+      std::vector<std::string> cmds = out_header.cmdline();
+      for (std::vector<std::string>::const_iterator it = cmds.begin();
+           it != cmds.end();
+           ++it) {
+        out << " " << *it;
+      }
+      out << "'\n"
+          << "# Canonical: " << (out_header.canonical() ? "true" : "false") << "\n"
+          << "# Format: '" << out_header["format"] << "'\n"
+          << "# Key_len: " << out_header.key_len() << "'\n";
+    text_writer writer;
+    do_merge<binary_reader, text_writer>(files, out, writer, min, max);
   } else {
     throw MergeError(err::msg() << "Unknown format '" << format << "'");
   }
Seulement dans jellyfish-2.2.10/jellyfish: merge_files.cc~

swatimanekar commented 2 years ago

I have used the following commands (for two genomes) ./jellyfish count -m 27 -C -s 300M -t 9 -L 2 -o redoak_27_GP39 dataset/GP39.fa ./jellyfish count -m 27 -C -s 300M -t 9 -L 2 -o redoak_27_DHX2 dataset/DHX2.fa ./jellyfish merge -o mer_counts_merged.jf redoak_27_GP39 redoak_27_DHX2 May you assist me with such command lines to get the output in the desired format?

cagret commented 2 years ago

Did you change the source code of JellyFish ? By change i mean add the overload write function.

swatimanekar commented 2 years ago

no

cagret commented 2 years ago

It will be much easier if you make the changes I sent you above.

swatimanekar commented 2 years ago

Please find an attachement for the comments, Review_comments_final_RedOak.docx

lpantano commented 2 years ago

@cagret, any comments for @swatimanekar ? @swatimanekar could you update your checklist please? It would be easier for me to help to move this forward.

@samhorsfield96, do you mind to take a look at your checklist and makes comment on what is needed for the missing point, so the author can make changes?

@cagret, could you update the paper so the reviewers can use the latest version for the review.

Thanks everybody.

samhorsfield96 commented 2 years ago

Hi @lpantano, the summary of my review is in the comments above (https://github.com/openjournals/joss-reviews/issues/4363#issuecomment-1142333731), apologies if I didn't make this clear when I added the comment. I've made sure my checklist is in line with my summary. Please let me know if there is anything else you need from me.

lpantano commented 2 years ago

Thank you @samhorsfield96. I will let @cagret and co-authors to work on them.

cagret commented 2 years ago

Hi @lpantano, the summary of my review is in the comments above (#4363 (comment)), apologies if I didn't make this clear when I added the comment. I've made sure my checklist is in line with my summary. Please let me know if there is anything else you need from me.

Hi, sorry for the delay, I thought of forwarding the mail but not to answer it. Thank for the feedback,

The "Statement of need" is in the readme, RedOak requires: A modern, C++11 ready compiler such as g++ version 4.9 or higher or clang version 3.2 or higher. A 64-bit operating system. Either Mac OS X or Linux are currently supported. OpenMPI libGkArrays-MPI Doxygen (recommanded but not mandatory)
We made the choice not to encapsulate our tool with Conda or Docker or other to have an optimal use of the available resources. This implies that the user must master the SGE or SLURM environment or at least MPI.
For the "mpirun -n 10 redoak ../test/test-0025_reads.fastq 25" i fixed this by ~/local_install/bin/mpirun -n 10 ~/local_install/bin/redoak --kmer 25 --genome resources/genome_ex1.fasta (thanks)
For the test on the 128 genomes : Is there a link so I can download the 128 genomes and try the indexing with RedOak? I did not find the link in https://pubmed.ncbi.nlm.nih.gov/27998823/
The output of query results goes to STDOUT use ">" to redirect into a file.

Paper (reviewed paper here: https://www.biorxiv.org/content/10.1101/2020.12.19.423583v2.full.pdf) TODO

samhorsfield96 commented 2 years ago

Hi @cagret, many thanks for your reply.

From my understanding, the statement of need is a short explanation of the novelty of your tool. You have already covered this in the paper in the "Our contribution" section, and so this could be adapted to go in the repository.
I understand the reasoning behind not publishing RedOak on conda and docker. This is more of a suggestion to increase the potential user-base of RedOak.
Unfortunately the Listeria genomes are not available publicly from what I can see, I apologise for this. The read dataset accession numbers are available in supplementary table 1 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5392494/#appsec1:~:text=but%20not%20amoxicillin-,Click%20here%20to%20view.,-(3.3M%2C%20doc))
The reasoning behind my comment on the output of RedOak is that the output to STDOUT is not easily parsed, even if redirected to a file. It may be worthwhile using a standard output format e.g. FASTA.

swatimanekar commented 2 years ago

@lpantano I have updated the checklist

lpantano commented 2 years ago

Hi @cagret, I will wait for you to address the comments from @samhorsfield96. It seems it needs the statement of need to be added to the paper text. I agree that is more about the need of the tool to answer a scientific question. As well, @swatimanekar is missing more documentation to get to a better level? can you confirm this @swatimanekar? if yes, can you point again to some examples where it can be improved. Thanks

swatimanekar commented 2 years ago

Yes. All those I have mentioned in my comments (commented on Jul 14).

lpantano commented 2 years ago

Hi @cagret, do you have an idea on when you can work on the requests from the reviewers? it will give them an idea to be on top when they are done and try to speed up this. Thanks

cagret commented 2 years ago

@swatimanekar I fixed all the review from the docx, except the square which indicates the end of the theorem. I didI did not regenerate the pdf. For the main review i don't understand "Given the non-dedicated tools like jellyfish and BFT, it's not completely clear to me that the time and memory as resource comparisons are fair", I only see time and RAM to compare tools that only use time and memory. Knowing that we have fixed the number of CPU, I'm not sure what else to do. Thank you very much for the feedback.

@samhorsfield96 I have made changes on the statement of need. Tell me if it's ok now.

@lpantano I think it's ok

Thank you very much everyone

lpantano commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

lpantano commented 2 years ago

@cagret can you check the error during compilation to make sure the document can be rendered. Or I can try to get help here from JOSS team. thanks

cagret commented 2 years ago

Latexmk: If appropriate, the -f option can be used to get latexmk to try to force complete processing. [makePDF] Run #1 Rc files read: NONE Latexmk: Run number 1 of rule 'lualatex' This is LuaHBTeX, Version 1.15.0 (TeX Live 2022) restricted system commands enabled. Latexmk: Getting log file '/tmp/tex2pdf.-00f6c558cb30663e/input.log' Collected error summary (may duplicate other messages): lualatex: Command for 'lualatex' gave return code 1 Refer to '/tmp/tex2pdf.-00f6c558cb30663e/input.log' for details

Error producing PDF. ! Missing $ inserted.

$ l.540 reduce the final score by \frac{l-k}{l}

cagret commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

cagret commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf.

cagret commented 2 years ago

@editorialbot commands

editorialbot commented 2 years ago

Hello @cagret, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Check the references of the paper for missing DOIs
@editorialbot check references

# Perform checks on the repository
@editorialbot check repository

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for branch
@editorialbot set joss-paper as branch

# Generates the pdf paper
@editorialbot generate pdf

# Get a link to the complete list of reviewers
@editorialbot list reviewers

cagret commented 2 years ago

@editorialbot check repository

editorialbot commented 2 years ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.21 s (406.4 files/s, 339036.7 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
Bourne Shell                     10           4773           5619          29405
m4                                9           1104            323          10470
C++                              17            506           1409           4437
C/C++ Header                     15            604           4059           1872
TeX                              16            278            224           1244
Markdown                          5            498              0           1160
Bourne Again Shell                3            106            223            607
make                              5             98            433            267
Perl                              1             69            147             66
YAML                              3              3              5             59
--------------------------------------------------------------------------------
SUM:                             84           8039          12442          49587
--------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 2 years ago

Wordcount for paper.md is 1399

cagret commented 2 years ago

@editorialbot generate pdf

openjournals / joss-reviews