Open didillysquat opened 3 years ago
I see now that it is required for the DESeq2 QC that is performed downstream of the salmon pseudo quantification.
Hi @didillysquat ! Apologies for the late response. I am holiday at the mo.
It's actually required to build the decoy sequences for the Salmon index. If you have a genome fasta available I believe it's advisable to build the index with both the genome fasta and transcriptome fasta. I discussed this with @rob-p whilst adding Salmon support here.
Maybe we should also add support for instances where the genome fasta isn't available though as this issue highlights that particular edge case.
Hi @drpatelh,
There is no hurry on this at all so please don't disrupt your holidays on my behalf.
For my particular case I'm using your wonderful pipeline as a quick but clean way to get a set of salmon pseudo quantification files from RNA-seq reads that I can then import into DESeq2.
I'm sure you're far more knowledgable about this than I am but I was simply following the guidance of the salmon tutorial which worked with only an indexed transcriptome fasta (i.e. no genome). For this particular use case, it could perhaps be useful for the pipeline to detect that neither --genome nor --fasta have been provided and so limit the output accordingly (i.e. no DESeq QC) but provide a warning saying that it is doing so. (I.e. it could say "no genome provided so skipping XXX").
Having said that, one extremely useful output from your pipeline (after running it providing the --genome information) is the txt2gene.txt file (called 'salmon_txt2gene.txt' in your pipeline) that maps the transcript IDs to the genes and allows the import of the salmon counts to DESeq2 using tximport. If appropriate, it could be useful to provide this in the main salmon output directory.
Thanks for your continued efforts!
Hi @didillysquat ! I was going to have a go at adding this feature for the 3.4 release but it will take quite a bit of refactoring so maybe we can it in 3.5.
I have, however added the functionality for the pipeline to be able to publish the salmon_tx2gene.txt
files in the salmon counts directory here.
@drpatelh Super! Many thanks for that.
Check Documentation
I have checked the following places for your error: I have checked both of these and looked through the introduction to see which steps might require the genome.
Description of the bug
When running the pipeline with
--pseudo_aligner salmon --skip_alignment
and providing a valid--transcript_fasta
and--salmon_index
but not providing--fasta
or--genome
, the pipeline will not run requesting that I provide a genome file:Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.
Steps to reproduce
Steps to reproduce the behaviour:
nextflow run nf-core/rnaseq --input woltering_samplesheet.csv --pseudo_aligner salmon --skip_alignment --transcript_fasta ../athal_transcriptome/Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz --salmon_index ../athal_transcriptome/Arabidopsis_thaliana.TAIR10.cdna.all.fa.gz.index -profile docker
Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file.
Expected behaviour
I would expect this specific route of the pipeline to be able to run without access to a genome, as running quantification with Salmon on the command line I need only provide the transcript fasta and the index. I've asked to skip allignments (that would otherwise require the genome), but which other step in the pipeline is the genome required for?
I would hope that the pipeline could run without access to the genome.
Log files
nextflow.log
Have you provided the following extra information/files:
.nextflow.log
fileSystem
Nextflow Installation
version 21.04.1
Container engine
Additional context