liumz93 / PEM-Q

a pipeline to process data of PEM-seq or data similar, which is more comprehensive than superQ
7 stars 6 forks source link

pass primer filter stitch reads = 0 #4

Open wuzw1 opened 1 year ago

wuzw1 commented 1 year ago

During preparing the PEM-seq library, I used a primer containing 4-digit barcode, and I have first tried to use a primer sequence started immediately after the barcode sequence. However, I found that the pass primer filter stitch reads number is 0. Then, I tried to slide the primer sequence on the reads, and similar results were still obtained. Next, I tried the same primer sliding on the example data provided on github. Although the pass primer filter stitch reads number changes along with slideing the primer location, but they are not 0. Then, I carefully compared the sti.sam files generated by different sequencing data, and I finally found a difference on RNAME. In the mm10 reference genome, the sequence name are labelled as "chr1,2,3...."; while the hg19 reference genome uses "1,2,3...". Then, I checked the results for condition1 in no_primer_filter in align_make_v5.1.py. If I use primer_chr = "chr2" in the PEM-Q command, then the condition1 returns "False" for all reads. If I use primer_chr = "2" in the PEM-Q command, the condition1 returns "Ture". Thereby, the PEM-Q pipeline can finally works properly, and successfully addressed this problem.

This issue is caused by the format of sequence name used in the reference genome files, which may be varied by the source you download from. Hope this finding may help others!

liumz93 commented 1 year ago

Thanks for sharing!