sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Extract UMI-containing reads from bam files #323

Closed MengjunWu closed 2 years ago

MengjunWu commented 2 years ago

Hi,

I want to extract UMI containing reads from the bam file (i.e. the 5'end reads). After read the manual of the bam tags, I am still a bit confused, so I want to ask a few questions for clarification.

  1. After running zUMI, there are two bam files generated, one is with suffix "filtered.tagged.Aligned.out.bam", another one is with "filtered.Aligned.GeneTagged.UBcorrected.sorted.bam"; what's the difference between the two (I saw the latter has a few extra bam tags)?
  2. Do both bam files contain both UMI-containing 5'end reads and internal reads? If so, which bam and bam tag I should use to extract only the UMI-containing reads?
  3. If I could pass customized gtf or feature files to zUMI for counting?

Thanks a lot!

Best, Mengjun

cziegenhain commented 2 years ago

Hi,

  1. The "filtered.Aligned.GeneTagged.UBcorrected.sorted.bam" is the final output bam file from zUMIs. Please refer to the wiki for a description of the tags used: https://github.com/sdparekh/zUMIs/wiki/Output#explanation-of-the-bam-tags-zumis-uses
  2. Both bam files contain all reads (internal + 5' end). The UB tag contains the UMI and is empty for internal reads. You could subset the bam file as such: cat <(samtools view -H in.bam) <(samtools view -@ 4 in.bam | grep 'UB:Z:[A-Z]') | samtools view -b -@ 16 -o out.bam
  3. Of course, you can pass any customized GTF file to zUMIs. Best, Christoph
MengjunWu commented 2 years ago

Thanks a lot for the clear explanations! Just to confirm if one only wants to use customized GTF for counting but whole genome GTF for star mapping, where in the yaml should one pass the customized GTF?

cziegenhain commented 2 years ago

Aha! Now I understand what you meant by that. So there is no foreseen option for this, the only way would be to replace the *.final_annot.gtf file in your zUMIs output folder during/after the Mapping stage (prior to Counting stage).

MengjunWu commented 2 years ago

Got it, many thanks!