mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

-outFilterMultimapNmax bigger than winAnchorMultimapNmax #69

Closed RRebo closed 4 years ago

RRebo commented 4 years ago

Hi, I am using TEtranscript on multi mode on the mouse genome with the following parameters: -outFilterMultimapNmax 200 -winAnchorMultimapNmax 100. I have asked this question on the STAR github, but I think you probably tested it. I think setting the outFilterMultimapNmax bigger than the winAnchorMultimapNmax doesn't mean anything right? Is the outFilterMultimapNmax limit the winAnchorMultimapNmax value? I wonder what would change if the outFilterMultimapNmax was set to 100. Thank you Rita

olivertam commented 4 years ago

Hi Rita,

This is from a previous conversation between the author of TEtranscripts and Dr. Alex Dobin (author of STAR) regarding these parameters. I am unsure if he has updated his approach with newer releases, and thus I would probably recommend taking his responses on STAR's github over this.

The winAnchorMultimapNmax parameter is at the heart of the STAR algorithm. STAR builds alignments out of the "seeds" - pieces of read sequence matching exactly to the genome. The seeds have varying lengths and can map to multiple loci in the genome (up to 10000 by default). First, STAR selects "anchor seeds" - the winAnchorMultimapNmax defines maximum number of multi-mapping loci for the anchors. Next, STAR collect all seeds in the windows around the anchors, and stitches them together into alignments. After all alignments are built, STAR filters them with outFilterMultimapNmax.

Each anchor can map to no more than 50 loci, but multiple anchors can map to more than 50 loci, allowing for alignments with >50 loci. However, there is no guarantee that all alignments of a >50 multi-mapping read will be found, since anchors mapping to >50 loci are dropped. This is why I recommend making winAnchorMultimapNmax twice as large as outFilterMultimapNmax.

Increasing winAnchorMultimapNmax allows STAR to use shorter seed as anchors, which increases sensitivity for problematic alignments (with many/mismatches indels). Even though STAR will try to stitch alignments around all the anchors, it will often happen that only one of the resulting alignments will have the highest score, and all other will have scores lower by >outFilterMultimapScoreRange (=1 by default), in which case the read will be considered uniquely mapped.

Thus, to directly address your question, it would appear that winAnchorMultimapNmax would have to be increased if you want to increase outFilterMultimapNmax. So if you want outFilterMultimapNmax 200, you would need winAnchorMultimapNmax 400. This will have an impact on alignment speed.

Thanks.