sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

I want to count how many alignments fall into exonic, intronic and intergenic regions. #328

Closed pandh0607 closed 1 year ago

pandh0607 commented 1 year ago

I am using zUMIs to process RNA sequencing data with UMIs. Meanwhile, I want to count how many alignments fall into exonic, intronic and intergenic regions. From the zUMIs_output\stats folder, I found three files "Example.genecounts.txt", "Example.readspercell.txt" and "Example.UMIcounts.txt", which all have 'type' column in them with relevant info 'Exon', 'Intron', 'Intergenic' etc. So I wonder if you could help me to figure out:

  1. What is the application scenario of these files?
  2. What kind of alignment/read is considered to fall into exonic, intronic and intergenic regions? totally enclosed or maybe 50% overlap?
  3. Can I use one of these files to analyze reads genomic origin directly?
cziegenhain commented 1 year ago

Hi,

  1. You could use these files to summarize statistics of your sequenced samples. If you want to visualize the number of alignments in exonic/intronic/intergenic, you would look at the readspercell.txt file and do not need to count anything yourself.
  2. As default, 1 base overlap is considered to assign reads to exons and introns, you can set a particular fraction of reads overlapping with features in the YAML file: https://github.com/sdparekh/zUMIs/blob/main/zUMIs.yaml#L81
  3. I do not understand what you mean with this?
pandh0607 commented 1 year ago

Thank you for your reply. I still have some questions。

If a read is mapped to both exonic and intronic ,intronic and intergenic,which one will be given priority?

cziegenhain commented 1 year ago

If a read maps to both exon & intron (according to the fraction overlap criterion), the exon will be prioritized. Anything that is neither assigned exonic nor intronic will be labelled intergenic.