gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
384 stars 78 forks source link

Paired-end reads: is superreads required? And how to use superreads if MaSuRCA is installed using bioconda? #442

Open PVlasov93 opened 3 months ago

PVlasov93 commented 3 months ago

I'm trying to understand if I can use stringtie with some RNA-seq datasets I received from a collaborator. The reads are paired-end, which is different from the previous times I used stringtie (all previous experiments involved single-end ribosome profiling experiments). I don't understand how to proceed here:

gpertea commented 3 months ago

The superreads step can be omitted, and I would say it usually is - it is not something we normally use either, and my impression is that the superreads feature does not really get much use in general.

That might be in part due to the limited availability of the software needed for that feature, which is not packaged with StringTie regular releases or software distributions like conda. We used a customized version of the MaSuRCA assembler to implement that feature, and provided specialized scripts to drive the process, so just installing MaSuRCA from bioconda will not work.

The proper usage of that superreads feature is documented in this README

Currently the only way to build and install the programs needed for the superreads approach is from the source code - either download the source from the latest release, or clone the Stringtie github repository, and follow the quick build instructions here: https://github.com/gpertea/stringtie/?tab=readme-ov-file#the-super-reads-module and the usage documentation here.

I've seen reports that the separate installation procedure documented there might fail or not build cleanly on newer systems and admittedly that superreads module was not updated/maintained in the last few years due to lack of use (at least on my side).

PVlasov93 commented 3 months ago

Thanks for clarifying that. Does that mean I can run stringtie on a BAM file from paired reads the same way I'd use it with single-end reads? I'm not familiar with the commands needed for it (is that related to the long reads feature?) As for superreads, are there similar programs for assembling single long reads from pairs? My targets usually have pretty low expression levels, so any improvement to the quantification method would help.