Closed juicejulia closed 3 years ago
Hi,
Not directly, no. zUMIs does not support barcodes embedded in the read header line, because it loses valuable barcode quality information. However, it should be possible to write a little awk command to generate a fastq file from the header with arbitrary set quality scores.
Best, C
Thank you for your fast response! In this case, I will give the barcodes with high-quality scores so they won't be filtered out? I am wondering, from your experience, how much does the barcode quality vary between different datasets? And how much impact does the barcode quality on the downstream process?
Yes generate them with high quality so the filtering just doesnt apply! (eg. just phred 40 = I
)
If the barcodes were sufficiently diverse and cover all color signals well, it shouldnt matter all too much, in my experience I have seen everything from horrible to super high quality depending on the dataset & sequencing run ;-)
Great! I will give it a try. Thank you!
Hello @cziegenhain , I tried to follow your suggestion by appending the barcodes from the header to the tail end of R1 and matched it by pseudo-quality scores. Below is an example of my edited R1 file: @K00124:663:HG7T2BBXX:7:1101:4097:1415 1:N:0:NTCTACGG+NGATCTCG NTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCNTACAGGTNGCGTTNACCCTAANTTTGNTCNTNGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCNTCTACGGNGATCTCG +
@K00124:663:HG7T2BBXX:7:1101:4746:1415 1:N:0:NAACTTGG+NAGAAGCC NGCGTGTTTGGATCTGAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCTGGCATTGGACTTTTCTTNTTANACATTTCNGAGCCNCCGGGGCNTACTNGGNTNTTCCGTTTGCCGTTTTTTTCTTTAAAAAAAAAATTTTTTTTTTTAANAACTTGGNAGAAGCC +
However, I received the following error message when executing zUMIs.
Wed Jun 16 11:07:54 EDT 2021 Filtering... sh: 1: Syntax error: Unterminated quoted string sh: 1: Syntax error: Unterminated quoted string Wed Jun 16 11:07:55 EDT 2021 Warning message: package ‘ggplot2’ was built under R version 4.0.3 Error in eval(bysub, parent.frame(), parent.frame()) : object 'XC' not found Calls: cellBC -> [ -> [.data.table -> eval -> eval
Attached is the yaml file I used. Could you help me troubleshoot? Thank you so much! test_zUMIs.yaml.txt
Try to gzip your fastq files, I think there are issues with just using plain .fq!
Thank you! This indeed seems to be the problem! At least I have moved to Mapping step now. Are gz files always required as input?
Hi there, Thank you for creating the wonderful zUMIs tool. I want to try this tool on the sci-RNA-seq data. We only have R1 and R2 fastq files and no separate index read files. Instead, the i5, i7 barcodes are embedded in the R1 and R2 header line. Below is an example: @K00124:663:HG7T2BBXX:7:1104:25733:17307 1:N:0:TAACTTGG+GTCGTGAA ATTCGCCTTGGATCTGAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCAGCTTTTAGGAAATTTATTTTCCTTCCATTTTTTTTTCCTTTGCTCAGGCACCTGCCCAGCAGCCCAGGACCCCTCAGGGGTGGGTCCCACCCCCCTAGG + AAFFFJJJJJJJFJJJFJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJ<-<7AJJJ-7--<--JJ-7FFJ---<-A----FJJJJ---7FJ-<-7<-AA-7<-77-77<--7---7-A--7-7-7FF-7A--A7--7--AAJ7----7- @K00124:663:HG7T2BBXX:7:1104:8613:17324 1:N:0:TAACTTGG+GTCGTGAA TGTTGTTTTTAACCGCGCTTTTTTTTTTTTTTTTTTTTTTTTTTAGCGAAGATTCTGTCTCTTATACACATCTCCGAGCCCACGAGACTAACTTGGTCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAACAATAAGAACGTACAACTTA + AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ--<<F7F<7FJFJ-<-7<FJ-A7<A<-A-7-J-7AAJJ-AA7F<FJFJ<JJ-77<AF77A-A-A<<JA7FJJ-A-<7JF<FJJJJA--7------------------ @K00124:663:HG7T2BBXX:7:1104:17848:17324 1:N:0:TAACTTGG+GTCGTGAA GGAGCTTCGCACTAGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGTTGGCTTTCTCTTTTTCACATCTCCCGGCCCCCGAGACTAAATTTTGTATCTCGTTTTGCGCCTTTTTCCTGAAAAAACAGGACAAATGAGTGAA + AAFFFJJJJJJJJFJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFA-F---7-A-F<--77--<-<------7<-----7AJ<A-7--7-----A------7<----<--7----7-7---7F-7FJFJ------------<7---
In this case, could I still use the zUMIs tools for processing? Thank you!