Closed whedon closed 7 years ago
Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS. @sjackman it looks like you're currently assigned as the reviewer for this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As as reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all JOSS reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
👋 @sjackman - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: http://joss.theoj.org/about#reviewer_guidelines
Any questions/concerns please let @biorelated know.
@standage Could the DOIs be made into links using http://dx.doi.org ? https://github.com/dib-lab/khmer/blob/master/papers/paper-v2.1.md#references e.g. doi:10.12688/f1000research.6924.1
I tried to verify the reduced memory usage of --small-count
and ran into an issue, reported at https://github.com/dib-lab/khmer/issues/1722
Could the DOIs be made into links using http://dx.doi.org?
Sure, I'll take care of that.
Thanks, Daniel.
I tried to run examples/stamps/do.sh
and encountered the error unrecognized arguments: --savetable
, reported at https://github.com/dib-lab/khmer/issues/1725
Links fixed in https://github.com/dib-lab/khmer/pull/1727.
support for k > 32 using the non-reversible hash function MurmurHash3
Is there an upper limit on the value of k?
Consider italicizing each occurrence of k, e.g. k-mer and k > 32.
Is there an upper limit on the value of k?
Nope, MurmurHash3 will handle arbitrarily large inputs. Of course, the larger k gets the more likely collisions in k-mer hash space are. But this is offset by two factors: 1) k-mers observed in genomes are sparse with respect to the entire k-mer space, especially as k gets larger; 2) collisions in k-mer hash space are orders of magnitude less frequent than collisions in the probabilistic data structures we use.
Consider italicizing each occurrence of k, e.g. k-mer and k > 32.
Done. Thanks for catching that!
---
title: 'Review of khmer release v2.1: software for biological sequence analysis'
author: Shaun Jackman
date: 2017-06-15
---
The authors describe version 2.1 of their software khmer for various analyses of sequencing data. khmer implements primarily two data structures, the Bloom filter to store k-mer presence/absence and Count-Min Sketch to store k-mer counts. This release adds new features including variable-coverage trimming, supporting k > 32, reduced memory utilization for the Count-Min Sketch by using 4 bit counters rather than 8, and assembling and compacting a de Bruijn graph.
The software installed easily using pip install khmer
. I successfully ran the example stamps
analysis of a synthetic metagenomic data set. I verified the 4-bit counter reduces the memory usage of khmer by half. I successfully ran the de Bruijn graph compaction script extract-compact-dbg.py
.
The 4-bit counter will be useful in reducing the memory requirements of assembling large genomes and metagenomes. Values of k larger than 32 will be useful in assembling 150-bp reads, where values of k around 100 are common. The de Bruijn graph assembly and compaction scripts are a useful contribution to the field of de Bruin graph assembly using compact data structures.
Although likely out of scope for this paper, I would like to see
My best regards, Shaun Jackman
@biorelated Hi, George. I have completed my review of khmer 2.1.
Thank you very much @sjackman for taking time to review. You have raised a couple of points that you would like to see although they might be out of scope of this manuscript. @standage could you confirm whether these are items that you would like to add to the current software package (and manuscript) or it is something that you would like to implement in a future release.
Thanks for your comments and your generosity with time, @sjackman and @biorelated!
Support for GFA output is something we've been discussing recently (see https://github.com/dib-lab/khmer/issues/1467), but this didn't get prioritized for the 2.1 release. I'm hoping it's something we can include by version 3.0.
The graph compaction code is convenient but also intentionally unsophisticated; it’s not interesting enough for a research paper. We’ll document it as we expand the Python interface, probably for 3.0.
A description of khmer's assembly algorithm and comparisons to other tools, while potentially insightful from a scientific perspective, is indeed outside the scope of this software paper. At least one member of our research group is actively developing novel assembly methods and applications, and publication of these developments should provide an opportunity to describe khmer's implicit de Bruijn graph implementation and graph traversal algorithms in depth.
Best wishes,
Daniel Standage
Thank you Daniel (@standage) for the quick response and thank you again Shaun (@sjackman) for your review. @arfon please could we get this ready for publication? Thank you George
@arfon please could we get this ready for publication?
👍
@standage - Could you move the references you currently have in the paper-v2.1.md
file into a paper.bib
file and cite them directly please? (You can read how to do that here)
Could you move the references you currently have in the paper-v2.1.md file into a paper.bib file and cite them directly please?
Done. See latest commit in maint/2.1
branch.
@standage - there's still some problems here sorry.
paper.bib
file in the YAML header: bibliography: paper.bib
(see the example paper here: https://raw.githubusercontent.com/arfon/fidgit/master/paper/paper.md)# References
section to the paper-v2.1.md
file (this is where the pandoc compilation adds your BibTeX references automatically)pandoc-citeproc: reference crusoe2015 not found
pandoc-citeproc: reference zhang2015 not found
pandoc-citeproc: reference pell2012 not found
pandoc-citeproc: reference zhang2014 not found
Updated. Thanks @arfon.
The text already cited those references as is. Perhaps the message appeared because I had not properly included the .bib
file and/or included a header for the References section?
Hopefully this addresses it. Let me know if anything is still amiss.
The text already cited those references as is. Perhaps the message appeared because I had not properly included the .bib file and/or included a header for the References section?
The issue was still there I'm afraid - case sensitive references not being picked up. I've fixed up the issues in https://github.com/dib-lab/khmer/pull/1739
When you've merged https://github.com/dib-lab/khmer/pull/1739 could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.
Sorry about the case sensitivity issues, and thanks for your PR. The archive is now posted to Zenodo at doi:10.5281/zenodo.822478.
@whedon set 10.5281/zenodo.822478 as archive
OK. 10.5281/zenodo.822478 is the archive.
@sjackman many thanks for your review here and to @biorelated for editing this submission ✨
@standage - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00272 :zap: 🚀 💥
Great news. Thanks to @arfon, @biorelated, and @sjackman for you contributions to the review and publishing process!
Submitting author: @standage (Daniel S. Standage) Repository: https://github.com/dib-lab/khmer Version: v2.1 Editor: @biorelated Reviewer: @sjackman Archive: 10.5281/zenodo.822478
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer questions
Conflict of interest
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?