nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
929 stars 709 forks source link

Support for long-reads (e.g. minion / pacbio) with minmap2 #380

Closed jgolob closed 4 years ago

jgolob commented 4 years ago

Opportunity

Long-read sequencers have some potential advantages for RNAseq over the more typical illumina short reads. These include:

The current nf-core rnaseq pipeline cannot handle long reads.

Resources

There is an evolving set of tools capable of handing the unique challenges of long reads: 1) [minmap2] (https://github.com/lh3/minimap2) to efficiently align the long reads against a reference genome

2) [TranscriptClean] (https://github.com/dewyman/TranscriptClean) to filter and correct the alignment for common errors introduced in the long-read sequencing tech

Suggestion

Incorporate a module into the nf-core/rnaseq pipeline for handling long-reads sourced from cDNA / raw RNA via minmap2 and TranscriptClean, after which the filtered alignments could be processed by the same techniques as other read sources.

Perhaps as a new --long-reads '*.fastq.gz' command line option.

drpatelh commented 4 years ago

Hi @jgolob ! We have a pipeline in development specifically for Nanopore long reads that uses minimap2: https://github.com/nf-core/nanoseq

In any case, it probably doesn't make sense adding this functionality here because the pipeline is already quite complex and the mapping, QC and downstream processing of long read data tends to be quite different.

Anyway, please take a look and feel free to join the #nanoseq channel on the nf-core Slack workspace if you have any questions. https://nf-co.re/join

I'll close this in favour of opening issues on #nanoseq :+1: