Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @betteridiot, @afrubin it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary:
OK DOIs
- None
MISSING DOIs
- https://doi.org/10.1093/bioinformatics/btl647 may be missing for title: Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases
- https://doi.org/10.1093/bioinformatics/btp324 may be missing for title: Fast and accurate short read alignment with Burrows-Wheeler Transform
- https://doi.org/10.1016/j.cell.2017.09.004 may be missing for title: Single-Cell Analysis of Human Pancreas Reveals Transcriptional Signatures of Aging and Somatic Mutation Patterns
- https://doi.org/10.1038/s41587-019-0074-6 may be missing for title: An open resource for accurately benchmarking small variant and reference calls
- https://doi.org/10.1038/nbt.2835 may be missing for title: Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls
- https://doi.org/10.1101/201178 may be missing for title: Scaling accurate genetic variant discovery to tens of thousands of samples
- https://doi.org/10.1186/s13059-017-1248-5 may be missing for title: BRIE: transcriptome-wide splicing quantification in single cells
INVALID DOIs
- None
Hi @betteridiot, @afrubin!, this is the review issue, please accept the invitation if you haven't done it so you can start the review here.
@lincoln-harris, can you work on adding DOI to your references?
Thanks all!
sure! i can do that
It looks like the references aren't formatting properly, possibly due to missing commas in the author names in the .bib file. @lincoln-harris would you please have a look?
This certainly looks like a useful and performant tool for transforming VCF files into a format that is easier to work with and enables researchers to make biological inferences. I've opened some issues (referenced above) but thought it better to put high-level feedback and comments on the paper in this thread.
I have not run the software yet, but plan to follow the updated installation instructions and run some test data after https://github.com/czbiohub/cerebra/issues/83 is addressed.
Test coverage is low (81%) and excludes quite a few cases that seem like they might come up when used on diverse data. Error messages/logging may also need to be improved. For example, this section is not covered by tests and includes output to stdout with limited context: https://github.com/czbiohub/cerebra/blob/3553f707290e57440ca9c2ab617667ab7cc4323a/cerebra/find_peptide_variants.py#L65-L69
Several failure conditions (e.g. https://github.com/czbiohub/cerebra/blob/3553f707290e57440ca9c2ab617667ab7cc4323a/cerebra/germline_filter.py#L114-L115) are not covered by tests, although adding test cases for these ought to be trivial. The multiprocessor use case for germline-filter
is also not tested.
Most usable documentation is in the README. This would be easier to navigate if it were split into multiple files, but another option is to focus on improving the README's content and structure, and dropping the docs/ directory altogether.
Many parts of the documentation seem to be lacking updates and consistency. For example, the instructions in docs/installation.rst are incomplete and different to what's in the README. The docs/authors.rst references a top-level AUTHORS.rst, which doesn't exist (although there is an AUTHORS.md that doesn't include all the contributors). There are similar issues affecting other files.
The repository is in need of a "polish pass" to clean up these kinds of issues and remove unnecessary files (such as .editorconfig in the top level).
The motivation section should make it clear that an intended use case is single-cell RNA-seq data. This isn't introduced until the find-peptide-variants
section, despite being the research example cited by the authors. The motivation section would also benefit from being written in a more scholarly way. The likely audience of this package are bioinformaticians working with VCFs and single-cell data, so the "typo" metaphor is unnecessary.
The authors also use the term "functional predictions" to mean predicted amino-acid changes. This package does not attempt to predict the "functional consequences of the variant", which is a much more ambitious problem. The authors should avoid using the term "functional" throughout the text and focus on what the tool is actually doing - inferring amino acid consequences given variant calls and gene annotations.
There are also numerous missing citations, across packages used (ncls is not cited despite having citation information in its GitHub README), important genomic formats (VCF and HGVS are not cited), and databases (COSMIC is not cited). I should emphasize that this is a partial list of missing citations.
Regarding the use of databases, the authors make the nonsensical claim that "variants that are not found in the COSMIC database [are] less likely to be pathogenic". While there are good reasons to filter for known variants (e.g. focusing on clinical actionability), this isn't one. COSMIC includes a tiny fraction of pathogenic variants, most of which are rare.
For variant discovery, users may want to filter for variants that are not found in COSMIC, and the authors should consider implementing this as an option. This would be especially useful if support for variant databases like gnomAD is added in the future.
Also, the paper mentions dbSNP filtering but it doesn't look like this is implemented based on the documentation.
One small nomenclature issue is that the transcripts in Fig 2 are referred to as "spliceforms" in the text but when alternative start sites are included (like t3) the more general term "isoforms" should be used.
This is a useful and interesting tool that would be appropriate for JOSS but additional work is needed across the project before it fulfills the criteria.
thanks for the comments, these are currently being addressed with https://github.com/czbiohub/cerebra/pull/84
@lpantano For some reason, GitHub is not allowing me to check the boxes (like in previous reviews). I may need another invitation.
@whedon re-invite @betteridiot as reviewer
OK, the reviewer has been re-invited.
@betteridiot please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations
Thank you. Apparently it was just a browser extension that was causing the problem.
For transparency's sake, I am still actively working on this review. However, I have a thesis committee meeting this week, and will be not be able to get back to the review process until then. Sorry for the delay.
the majority of reviewer comments have been addressed with the latest release. looking forward to additional feedback!
@lincoln-harris I have a few additional comments on the writing in the germline-filter section of the paper and the README.
A more typical phrasing from cancer genomics would be "tumor vs. normal" rather than "tumor/pathogenic vs control."
The text says that germline-filter can be used "so as to not bias the results by including non-pathogenic variants." While I understand the motivation for this, you can't say that germline variants are non-pathogenic - they may well be strongly pathogenic for another disease!
Instead of "control and the experimental tissue" the text should stick with the word "sample" (and tumor/normal) for clarity.
Hi @lincoln-harris,
Any update on the latest @afrubin's comment?
@betteridiot, any update for this review?
Thanks!
Yes, tomorrow I will drop all my notes on the Issues page.
My notes have been posted @lpantano and @lincoln-harris. If they are addressed, then my review will be complete. Sorry for the delay.
@lpantano any updates?
@betteridiot, @afrubin, do you think you can update on the final version? thanks
@lincoln-harris, can you comment on the open issues opened by @betteridiot? if they are addressed or not?
Only issues of mine not addressed yet is https://github.com/czbiohub/cerebra/issues/108. It is only in regard to community guidelines. However, this is a very minor issue as a CONTRIBUTING.md is already provided. Therefore, this paper is ready for publication on my end.
Thank you @betteridiot. @afrubin, Did the authors addressed all your comments? Thanks.
Hey @afrubin, any final comments for this paper? Thanks!
Hi @lpantano, there are still a few open issues (such as https://github.com/czbiohub/cerebra/issues/83#issuecomment-681311968) and I feel like I should read the final version of the manuscript but I can't find a current PDF. I think it is very close though!
sorry I've been away for a few weeks. @afrubin I believe https://github.com/czbiohub/cerebra/issues/83#issuecomment-681311968 has now been addressed.
the paper.md
on the master
branch is fully updated: https://github.com/czbiohub/cerebra/tree/master/paper. would you like me to include a PDF in that folder, or perhaps compile using Whedon and send to you?
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Looks good to me. My only remaining comment is that there's no API-level documentation for the code itself, and it looks like substantial extra work would be needed to generate the docstrings sphinx.ext.autodoc or similar tools would require.
However, given the focused nature of the software package (being an analysis tool not a library) and the detail in the usage guide, this might be OK. Happy to hear @lpantano weigh in on this.
Thanks everybody! I think they didn't aim the tool as a library, and I will feel ok not to mention the API. Although it could be a goal in the long term.
@afrubin, @betteridiot can you check all the items in the review list in the first comment of this review if you agree all has been addressed?
If they check all the items, @lincoln-harris, can you confirm the latest version, and prepare an archive for that version in zenodo where the authors list and title matches the paper and send me the DOI?
Thanks!
here is the DOI: http://doi.org/10.5281/zenodo.4037272
Hey @afrubin, @betteridiot, can you make sure all the items in the checklist are done? I see a couple of items unchecked and I want to make sure you are fine with them.
@lincoln-harris, thank you for the DOI. We need the archive to contain all the code as well, so it should be a copy of the repository at the time of publication. Can you update it, please? Thank you!!!
@whedon set v1.2.0 as version
OK. v1.2.0 is the version.
@lpantano All items are checked now. Thanks for the reminder!
All checked
@lpantano requesting zenodo access from the git organization. will be able to upload the code archive as soon as that happens.
@lincoln-harris thanks for updating this, could you update the title to match the paper’s title? That would be the last thing missing. Thanks!
done, the titles now match!
@whedon set 10.5281/zenodo.4050557 as archive
OK. 10.5281/zenodo.4050557 is the archive.
@whedon accept
Attempting dry run of processing paper acceptance...
PDF failed to compile for issue #2432 with the following error:
/app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/bibtex_parser.rb:77:in doi_citation': undefined method
encode' for nil:NilClass (NoMethodError)
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/bibtex_parser.rb:64:in make_citation' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/bibtex_parser.rb:50:in
block in generate_citations'
from /app/vendor/bundle/ruby/2.4.0/gems/bibtex-ruby-5.1.4/lib/bibtex/bibliography.rb:149:in each' from /app/vendor/bundle/ruby/2.4.0/gems/bibtex-ruby-5.1.4/lib/bibtex/bibliography.rb:149:in
each'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/bibtex_parser.rb:43:in generate_citations' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/compilers.rb:246:in
crossref_from_markdown'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/compilers.rb:21:in generate_crossref' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/lib/whedon/processor.rb:95:in
compile'
from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/bin/whedon:82:in compile' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/command.rb:27:in
run'
from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor.rb:387:in
dispatch'
from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-364ded062842/bin/whedon:119:in
<top (required)>'
from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in
ops, @lincoln-harris do you think is related to your paper? did you change something from the last time it compiled here?
@whedon accept
Submitting author: @lincoln-harris (Lincoln Harris) Repository: https://github.com/czbiohub/cerebra Version: v1.2.0 Editor: @lpantano Reviewer: @betteridiot, @afrubin Archive: 10.5281/zenodo.4050557
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@betteridiot & @afrubin, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @lpantano know.
✨ Please try and complete your review in the next six weeks ✨
Review checklist for @betteridiot
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @afrubin
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper