openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
714 stars 38 forks source link

[REVIEW]: RNAsik: A Pipeline for complete and reproducible RNA-seq analysis that runs anywhere with speed and ease #583

Closed whedon closed 6 years ago

whedon commented 6 years ago

Submitting author: @serine (Kirill Tsyganov) Repository: https://github.com/MonashBioinformaticsPlatform/RNAsik-pipe Version: 1.4.9 Editor: @pjotrp Reviewer: @andrewyatz Archive: 10.5281/zenodo.1403976

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/32d910cf7f5acf29c9b49b82dfb647ce"><img src="http://joss.theoj.org/papers/32d910cf7f5acf29c9b49b82dfb647ce/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/32d910cf7f5acf29c9b49b82dfb647ce/status.svg)](http://joss.theoj.org/papers/32d910cf7f5acf29c9b49b82dfb647ce)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@andrewyatz, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @pjotrp know.

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 6 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @andrewyatz it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

pjotrp commented 6 years ago

Hi @serine, thanks for your interesting submission. @andrewyatz will review. To expedite the process, can you go through above checklist and make sure we ought to be able to tick the boxes? And ping us here when you are done. Thanks!

andrewyatz commented 6 years ago

Hi @serine I've been running your installation procedure and have two issues with it. Could you double check your installation scripts on a clean system and just make sure everything is in order please? I'd rather not be opening lots of issues on your repos if I can avoid it plus it lengthens the review process incredibly.

pjotrp commented 6 years ago

Ping @serine. The reviewer needs to be able to run the software. Can you provide the documentation to do just that? Should be straightforward with JVM. If you have already done that and can tick above boxes (e.g. for contribution guidelines) spell it out for the readers. Thanks!

serine commented 6 years ago

@pjotrp and @andrewyatz as I've mention in my previous comment RNAsik depends on a lot of different tools and I some what assumed that dependencies of every tool don't need to be listed under RNAsik. In the case of BigDataScripit it depends on java, golang and ant, given that I said BigDataScript is required I assumed it dependencies.

I since updated the docs to reflect all major/crucial dependencies in the docs, that of course requires a few system installs. I've tested the docs out and it worked for me. If you follow quick start RNAsik should install. As I also mentioned in the docs now, I've only really tested this on ubuntu 16.04 (also works on 14.04). I don't have a mac so can't really test on that, but you wouldn't want to run it on mac anyway given the resources, unless the data is small of course e.g bacteria.

so like I said docs have been updated with new install instruction.

Let me know how it goes. also if you gonna meet some issues on mac do let me know so that I can accommodate mac users too.

thanks

serine commented 6 years ago

Just to follow up on the previous comment above. virtualenv isn't strictly necessary if you already have ansible installed, you can skip virtualenv step.

Every time you are opening a new shell you'll need to export both bds and RNAsik again. You can put those two export lines into your ~/.bashrc if you like. I personally don't like to pollute my ~/.bashrc with random exports.

A general comment on bio-ansible, which I think I already made somewhere, you can change bio.yml to all.ym and add --ask-become-pass which will prompt you for your sudo password. This will do bunch of apt-get installs (again, ubuntu specific) following by tools installation. The problem with doing all.yml this will install potentially many more apt-get packages than you really don't need for RNAsik

I realise now that this is rather involved installation process. So far this is the best we have. If you have any suggestion and/or comments don't hesitate to open an issue under RNAsik issues.

cheers

pjotrp commented 6 years ago

Complicated tools may have complicated setups ;). One solution would be to provide a Docker container (though I am no fan particularly). @andrewyatz are up up for trying? Possibly a VM would be the easiest.

serine commented 6 years ago

yeh, I'm not sold on docker yet, but I can try getting you a docker image if that's preferred. Let me know @andrewyatz

andrewyatz commented 6 years ago

I'm going to try normally. You have my upmost sympathy with respect to dependency management and bioinformatic programs.

pjotrp commented 6 years ago

You may want to look into GNU Guix some day. I don't have any problems with the dependency graph.

pjotrp commented 6 years ago

@andrewyatz how are you doing?

pjotrp commented 6 years ago

Hi @serine. Looks like we have an impasse. Software deployment is the achilles heel of bioinformatics!

Can you look into a deployment scheme? To me Docker, bioconda or GNU Guix are the systems of choice. Apt should not be used because it interferes with other software on a system. It should be easy to run software - at least a version of it. One advantage of choosing one of the three that it includes dependencies and it that gets promoted for future pipelines too. There is a paper coming out soon on comparing work flows, including conda and Guix. If you are interested I can send you a pre-press.

serine commented 6 years ago

@pjotrp It is a shame that you guys having issues with running pipeline, but it is great to get this kind of feedback about "ease of use" so to speak. Like I said before I can't imagine any other pipeline will be any easier to install. At the end of the day the pipeline doesn't require any installation at all, just BigDataScript, that can be downloaded as pre-compiled binary. But of course the pipeline of no use if other tools aren't there.

I think docker beta ready. I'd have to dig into bio-ansible a bit (docker image is also build with bio-ansible, but you won't have to do it. I'll make it) I think internally we are would like to head bioconda way, in fact similar to docker, we are some what in alpha, but at the moment there is no conda channel for Golang and so there isn't a clean way of specifying dependency for BigDataScript at the moment.

I did have a look at Guix briefly, I wasn't convinced.. I get a little confused when package manager gets mixed in with distro.. similar to https://nixos.org/nix/ which is both a package manager and a distro..

Yes please I'd like the pre-print if this is ok. Need to be convinced that docker or bioconda is the way to go.. I personally like to compile from source hence bio-ansible, but do understand "ease of use" and quick access is more important

I have a few things piled up and am helping at running a python course next week. Also down here (Melbourne) we having Easter break from the 30th of March, so might be out of touch for a couple of week, but will try to get this all sorted and report back as soon as I can

Cheers

andrewyatz commented 6 years ago

Hi. Sorry this is more my fault that it's just not been possible for me to devote enough time to test through the existing changes that @serine has implemented. Don't take this as a comment on the changes but my availability.

andrewyatz commented 6 years ago

Okay I've resurrected my work and switched to an Ubuntu box. Still no joy. In fact the same error I see on my mac. This is in MonashBioinformaticsPlatform/RNAsik-pipe#22 with the error.

andrewyatz commented 6 years ago

Sorry can I prod again on this. There's little to no chance of me installing this. As far as I am aware there's been no response to the issues I've raised. The error isn't limited to just my mac as otherwise my VM attempts would have worked. Not sure where else to push this now

serine commented 6 years ago

@andrewyatz sorry about that, I've gone quite, my fault. I just responded to your other issue could you have one more try. Thanks

serine commented 6 years ago

@andrewyatz It does appear that bio-ansible way to install RNAsik is a bit too much. Taking advice from @pjotrp and others around me I've created conda package, for linux only at this stage, which should simplify install great deal.

Could you have a look at updated docs

Basically you'll need to install miniconda and then RNAsik from my "channel" (conda terminology here)

bash Miniconda3-latest-Linux-x86_64.sh 
conda install -c serine rnasik 
conda install -c bioconda qualimap 

QualiMap is separate install, because it itself has fair few dependencies and it was tricky to include into RNAsik package

On my laptop it took about 35 minutes to install, this is subject to reasonably good internet connection as installation creates fair bit of traffic, downloading all of the different dependencies.

Hopefully this will simplify things for you and others

Cheers

serine commented 6 years ago

by the way this is my channel page https://anaconda.org/serine and this is rnasik package page https://anaconda.org/serine/rnasik

serine commented 6 years ago

@andrewyatz sorry, I think you also need to added a couple of channel

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

before installing from conda

also updated docs.

p.s default location of the config is ~/.condarc

mine looks like this

[biostation2]~$ cat ~/.condarc 

channels:
  - bioconda
  - conda-forge
  - defaults
pjotrp commented 6 years ago

@andrewyatz do you have time to look at this? We can look for another reviewer if necessary.

serine commented 6 years ago

@pjotrp @andrewyatz Let me guys know if you need anything else from me.

Just to reiterate that the best way to install RNAsik is to use conda install. I actually shifted myself and now use conda install all the time, cause I tend to jump between diff machines.

Also just to let you know that RNAsik have moved since I submitted for review. I'm now up to 1.5.1 version and making 1.5.2 soonish.. there have been some changes, but all in backward compatible manner.

You can only get 1.5.1 through conda install, but that shouldn't effect the review right?

Cheers

andrewyatz commented 6 years ago

Hi @pjotrp as the delay in my response suggests that I am no longer a good person to do this review. I am really sorry about this and apologise to Kirill and yourself about this. I appreciate the effort you have been spending on improving the process and feel conda is an excellent way to move in

On 29 May 2018, at 23:30, Kirill Tsyganov notifications@github.com wrote:

@pjotrp @andrewyatz Let me guys know if you need anything else from me.

Just to reiterate that best way to install is to use conda install. I actually shifted myself and now use conda install all the time, cause I tend to jump between diff machines.

Also just to let you know that RNAsik have moved since I submitted for review. I'm now up to 1.5.1 version and making 1.5.2 soonish.. there have been some changes, but all in backward compatible manner.

You can only get 1.5.1 through conda install, but that shouldn't effect the review right?

Cheers

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

serine commented 6 years ago

@andrewyatz no problem, thanks for all the feedback that you've provided so far.

@pjotrp hopefully someone else can help out?

Cheers

pjotrp commented 6 years ago

Alright, let's look for someone who can run this pipeline, so we can wrap up review.

pjotrp commented 6 years ago

Hi @serine, sorry about the delay. So, you are saying that a conda install should work. Do you happen to have some example fastq files, i.e, something I can try? I am proposing I run it myself with your help. If it works I can switch to a reviewer role and I'll ask someone else to be the editor. It has happened before.

serine commented 6 years ago

@pjotrp yes, conda installable. I just wrote quick_install.bash script that obviously does quick install, but I'm also using it with my travisCI testing.

As you can see from quick start section in the docs that you can just run the script and export executables into your PATH. The details of the script are explained under installation section or just cat the script obviously

The actual test data again explained in data set for testing section and I've updated the README with the command to run, but here it is just in case

RNAsik -fqDir http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/fqFiles.txt \
       -align star \
       -fastaRef ftp://ftp.ensembl.org/pub/release-91/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz \
       -gtfFile ftp://ftp.ensembl.org/pub/release-91/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.91.gtf.gz \
       -counts \
       -paired \
       -all

There are a couple of things I should mention.

I'm waiting on bds to make next v2.0rc9 release before making my next release 1.5.2

I understand that there are a lot ifs here. But things are out of my hands with respect to bds development.

Basically if you just run the install script and test command I posted above it will work. Just note that STAR aligner needs fair bit of RAM, the test data set is small and I think 10-15 Gb of RAM should be plenty for mouse or human you really need ~ 32 Gb of RAM

thanks

serine commented 6 years ago

Forgot to mention new feature in RNAsik is -fqDir http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/fqFiles.txt

where -fqDir can either take a file or directory. If text file is given, fastq files are expected to be one per line and if directory is give it will get traversed recursively looking for fastq files

the content of fqFiles.txt in this case is

http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/ctrl1_R1.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/ctrl1_R2.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/ctrl2_R1.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/ctrl2_R2.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z1_R1.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z1_R2.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z2_R1.fastq.gz
http://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z2_R2.fastq.gz

RNAsik will handle the download for you. Also note that fqFiles.txt is on url as well, again download and parsing are all handled internally, just run the command as is above.

pjotrp commented 6 years ago

I'll give it a go this week.

pjotrp commented 6 years ago

Sorry about the delay. This is on my list now.

serine commented 6 years ago

great, thanks. 1.5.2 is out now by the way

On Mon, 16 Jul 2018, 19:14 Pjotr Prins notifications@github.com wrote:

Sorry about the delay. This is on my list now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/583#issuecomment-405189048, or mute the thread https://github.com/notifications/unsubscribe-auth/AGY8Q9U4D0onMHu1BieeuH8hwyzI_PKTks5uHFlqgaJpZM4SF7TN .

pjotrp commented 6 years ago

I installed the software with bioconda bootstrapping from the GNU Guix conda package. Nice work! Mostly worked out of the box. I'll show a writeup when I am done.

@serine - do you have a test dataset I can just download and run? Sorry for the pain, but I don't want to be hunting down NCBI resources. Even a small set will do. I would like a single command to download the data - or does RNASik doe this?

RNAsik -fqDir https://bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/fqFiles.txt \
       -align star \
       -fastaRef ftp://ftp.ensembl.org/pub/release-91/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz \
       -gtfFile ftp://ftp.ensembl.org/pub/release-91/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.91.gtf.gz \
       -counts \
       -paired \
       -all

Fatal error: /home/pjotr/.conda/envs/joss-review-583/bin/../opt/rnasik-1.5.2/src/sikFqFiles.bds, line 94, pos 9. no FASTQ files were found, check -fqDir, -extn or -pairIds
Stack trace:
error "no FASTQ files were found, check -fqDir, -e ...  # /home/pjotr/.conda/envs/joss-review-583/bin/../opt/rnasik-1.5.2/src/sikFqFiles.bds:94
  samplesMap = getSamplesMap( fqFiles,samplesSheet ...  # /home/pjotr/.conda/envs/joss-review-583/bin/../opt/rnasik-1.5.2/src/RNAsik.bds:174
serine commented 6 years ago

sorry about that, will have a look at the issue. I thought I tested out and it worked

[bioinformatics1]~/www/sikTestData/rawData$ tar -tf IndustrialAntifoamAgentsYeastRNAseqData.tar 
IndustrialAntifoamAgentsYeastRNAseqData/
IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z1_R2.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z2_R1.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/ctrl2_R1.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/ctrl1_R2.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/ctrl1_R1.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/ctrl2_R2.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z2_R2.fastq.gz
IndustrialAntifoamAgentsYeastRNAseqData/Ind-Z1_R1.fastq.gz
[biostation2]~$ wget bioinformatics.erc.monash.edu/home/kirill/sikTestData/rawData/IndustrialAntifoamAgentsYeastRNAseqData.tar

Let me know if that works, thanks

pjotrp commented 6 years ago

My report: I can report the software installs and runs to satisfaction. In fact, kudos @serine and colleagues for creating such an advanced pipeline which can be run on almost any Linux system out of the box with nice reporting output. I hope people will start using it. Since it is bionconda based you should also look at making it part of the common workflow initiative https://www.commonwl.org/ with @mr-c.

Let me tick the relevant boxes which means we can process the publication for acceptance. @serine, sorry about the delays, it is not the usual, but then your software is not the usual either :). Good stuff.

pjotrp commented 6 years ago

@whedon generate pdf

whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

pjotrp commented 6 years ago

The review process is now complete. To finalize your submission and accept your paper in JOSS, we need two things. First, can you confirm that all references in your bibliography have a DOI (if one exists).

Second, we need you to deposit a copy of your software repository (including any revisions made during the JOSS review process) with a data-archiving service. To do so:

  1. Create a GitHub release of the current version of your software repository
  2. Deposit that release with Zenodo, figshare, or a similar DOI issuer.
  3. Post a comment here with the DOI for the release.
serine commented 6 years ago

This is great, thanks heaps !

About the second part; I've done Zenodo here https://zenodo.org/record/1315013#.W0_PY3VuaV4

DOI

I was a bit confused about which DOI to use.. I just tool whatever was there in the pds that you've generated.

About the first bit can you confirm that all references in your bibliography have a

Looking through the references these tools don't have doi

Broadinstitute. n.d. “Picard-Tools.” broadinstitute. http://broadinstitute.github.io/
picard.
Li, Heng. 2013. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with
BWA-MEM,” March. http://arxiv.org/abs/1303.3997.
Powell, David. 2015. “Degust: Powerfull and User Friendly Front-End Data Analsysis,
Visualisation and Exploratory Tool for Rna-Sequencing.” github. http://degust.erc.
monash.edu

do I need to do something about this?

mr-c commented 6 years ago

Would an RRID URI be acceptable? I like to cite both the idea of a piece of software and any relevant methods papers.

Picard:

https://identifiers.org/rrid/RRID:SCR_006525

Degust:

https://identifiers.org/rrid/RRID:SCR_001878

For bwa-mem I suggest listing both the Arxiv paper and the RRID https://arxiv.org/abs/1303.3997 https://identifiers.org/rrid/RRID:SCR_010910

(There are citation instructions for BWA but it doesn't mention the mem algorithm: http://bio-bwa.sourceforge.net/ just short and sw)

pjotrp commented 6 years ago

@arfon we are ready for acceptance. Can you shine your wisdom on above? Note to others: Arfon is still on holiday, so it may take a little while.

pjotrp commented 6 years ago

And thanks @andrewyatz for your review! You did most of the hard work :+1:

serine commented 6 years ago

great thanks a lot guys, and yes shout out to everyone involved, every little bit of advise, comment and gitissue really helped in shaping this into a better tool !

serine commented 6 years ago

@mr-c was that comment Would an RRID URI be acceptable? to me or at @arfon cause I'm happy with what ever works

mr-c commented 6 years ago

@serine my comment about RRIDs was mostly to JOSS management. I'm just a random person with no special standing here :-)

But, while you're open to feedback from strangers, I would suggest splitting up the Summary section into multiple paragraphs. I find it hard to read as one big block of text.

https://github.com/openjournals/joss-papers/blob/joss.00583/joss.00583/10.21105.joss.00583.pdf

The template is also overlapping the page numbers with the footer.

serine commented 6 years ago

@mr-c how about now https://zenodo.org/record/1315013#.W1A5FHVuaV4 ?

using

pandoc -o check.pdf --bibliography=paper.bib paper.md

builds to this

check.pdf

not sure how it'll look with JOSS compilation.. also don't think I can do much about footer overlap

if you think this looks ok I can commit this change to master and perhaps ask @pjotrp to rebuild final pdf if that's ok?

mr-c commented 6 years ago

@serine much more readable, thank you.

serine commented 6 years ago

thanks, here is commit dd567e44

@pjotrp would you be able to regenerate new JOSS pdf? thanks