sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
275 stars 68 forks source link

Question about put pattern and internal reads together to analyse #382

Open Dkaaaaa opened 11 months ago

Dkaaaaa commented 11 months ago

Hi, I am confused about the result and use of zUMIs pipeline. Here are my yaml content. pm1-1.yaml.txt Firts, my input was the paired-end reads with start of specify pattern sequence. The reads1 contain total 1657623 reads, while the STAR result : filtered.tagged.Log.final.out shows that Number of input reads are 846533, _Q1 _I think that was because of the filterring condition and cDNA range setted in reads1 in my yaml file. Am I right?__ image

Secondly, I have found that reads id in "pm1.filtered.tagged.unmapped.bam <flag including: 4>", "pm1.filtered.tagged.Aligned.out.bam<flag including: 0, 16>" and "pm1.filtered.Aligned.GeneTagged.sorted.bam<flag including: 0, 16>" are same. _Q2 _And why "pm1.filtered.tagged.unmapped.bam reads id are the same as pm1.filtered.tagged.Aligned.out.bam and pm1.filtered.Aligned.GeneTagged.sorted.bam. What's more, "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam<flag including: 0, 16, 252, 276>" has missed some reads according to above three bam files, below is the miss reads in "pm1.filtered.tagged.Aligned.toTranscriptome.out.bam" in the pm1.filtered.Aligned.GeneTagged.sorted.bam file. miss-in-toTranscriptome.bam.txt, I also check the first read in miss reads bam result mapping position, below is the ENSG0000014267 position of transcriptome of my reference, and is no problem. Snipaste_2023-12-04_13-23-32 _Q3 _Why these reads miss in pm1.filtered.tagged.Aligned.toTranscriptome.out.bam?

Finally, I have separate my raw reads into paired patterned_reads and paired internal_reads. And I think you should know that my data was silmilar to smart-seq3, but my data was based on 3' polyA to obtain the mRNA. pm1-1.yaml above was input with patternedreads, and the reads1 was set for BC and UMI only, the reads2 are set for cDNA. Now, I wanna put my internal reads together to analyze, below is my new yaml content. I set the paired internal reads as file3 and file4, with cDNA range: 1-150. When I run with this yaml file, there are some erro below. Q4 _How should I do to put my patternedreads and internal reads together to analyze? image image below is my yaml file. pm1-2.yaml.txt below is my new STAR filtered.tagged.Log.final.out shows that Number of input reads are 846533, it seem the file3 and file4 are fail to put together to analyze. While the Uniquely mapped reads number are less than not put together to analyze. image

I am so puzzled about above, looking forward to your reply, thanks a lot! Dka

cziegenhain commented 11 months ago

Hi,

as mentioned in your other issue, the use of the particular 11bp pattern "ATTGCGCAATG" is reserved to the processing of Smart-seq3 data. our pipeline is hardcoded in this case and I am unfortunately unable to provide support to custom protocols that you might be trying to process. Sorry about this,

Christoph

Dkaaaaa commented 11 months ago

I am still puzzle about your answer. Below is the smartseq3 yaml. image

What are the file3 and file4 function for this pipline? What if I do not separate my data into patterns reads and internal reads, and than just setup the file1 and file2 like this: file1: name: /home/ccy/1-scrna-data-2023-11-14/rawdata/star-test-1/patterns_and_internal_1.fq.gz base_definition:

bioinfotec commented 10 months ago

@Dkaaaaa zUMIs will filter some low quality reads according to barcode and UMIs before go to STAR and i think that's why the number of input reads is less than in reads1 file.