sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 67 forks source link

mapping to gtf #125

Closed hmassalha closed 5 years ago

hmassalha commented 5 years ago

Hi all,

I have a general question with regards to the algorithm of zUMIs in mapping reads (might be a STAR issue, hope you can give me a direction) I am getting the most abundant genes in my samples are Gm... and other pseudogenes. My question is what is the logic of zUMIs in mapping these reads to 'non-informative' genes? are they bad reads that mapped to junk sequences? What is the decision of zUMIs for reads with mutations? If I delete these genes manually form the GTF file, so that will affect the mapping decision of ambiguous reads?

Would you please suggest for me a source where I can find the 'decision tree' for mapping low quality reads?

Best, HM

cziegenhain commented 5 years ago

Sorry didn't see this.
The mapping of reads is dependent on STAR and then we look up features in the gtf file. I don't think it's correct to assume that anything mapping to pseudogenes is automatically junk. How did you determine that the reads contributing to expression counts in pseudogenes are low quality? As for mutations: STARs default settings allow for a certain number of mismatches that you could also tune with custom parameters. Refer to the pretty good manual for STAR for this.