Closed jorisbalc closed 8 months ago
Hi there, Can you please mention which version of Dorado you used and the parameters? I am wondering if this is due to read splitting. To investigate and identify where the problem is, can I suggest you do the following and tell me the stats.
Could you grab the list of read IDs in the can_pod5_merge.blow5
by using:
slow5tools skim --rid can_pod5_merge.blow5 > blow5.rid.list
Check the count:
wc -l blow5.rid.list
Then get the list of readIDs from the Dorado BAM file as:
samtools view dorado.bam | cut -f1 > bam.rid.list
wc -l bam.rid.list
Can you tell me the counts you get?
Then we can do the following
grep -F -f bam.rid.list blow5.rid.list > both_in_bam_and_blow5.list
grep -v -F -f bam.rid.list blow5.rid.list > in_blow5_not_in_bam.list
grep -v -F -f blow5.rid.list bam.rid.list > in_bam_not_inblow5.list
Get the read counts using wc -l for the above lists.
Also, could you post a couple of reads from the dorado BAM file as well as a couple of reads from the barcode02.fastq?
Thanks for the reply,
Posting everything in order:
$ dorado -v
0.4.1+6c4c636
Basecalling:
$ dorado basecaller --kit-name SQK-NBD114-24 /home/v313/Dorado/models/dna_r10.4.1_e8.2_400bps_fast@v4.2.0 merge_all/ > calls.bam
Demux and grab barcode02 in this case:
$ dorado demux --output-dir fastq/ --no-classify --emit-fastq calls.bam
Count the reads:
$ wc -l blow5.rid.list
569170 blow5.rid.list
Checking the bam reads:
$ samtools view calls.bam | cut -f1 > bam.rid.list
$ wc -l bam.rid.list
570778 bam.rid.list
Read comaprison:
$ wc -l both_in_bam_and_blow5.list
567562 both_in_bam_and_blow5.list
$ wc -l in_blow5_not_in_bam.list
1608 in_blow5_not_in_bam.list
$ wc -l in_bam_not_inblow5.list
3216 in_bam_not_inblow5.list
Few short reads from the .bam:
d9f22985-c0ba-4fa7-a380-28aa96f72e1d 4 * 0 0 * *0 0 AAGCACATCGAGAATAATAGTCCCATCAAGATCGCCTGAGAAGTCACAAAGATCTCATCAATATACATTCGAACTGAGGTACATATGCGCCACTTACATGGTGAGAGCCTAGCTTCATTATGGCGGTGCTGATGGTTTCTGGCAGTGTTGCTAAGGGGGCGACTAGCTGCAGCGGAGACATTGCTGGCTGATGTAGTAACTCCTTTCCAGCACATGGCTGACTGTTGTGGGTGGGGCCGGTGGTCAGGGGCCAGGTGGCGACTGATCCAGGATCCTGGTGTGGTCCTGGCGGGTGCGTGGCCGGCGTTGTTCATGGCCGCCAACGGCTCCGCGGTCAGCTAGTGCCGGTGCCGCGAGCGTGCAGGTTCCGGCGATTCTTCATTGCAGCTCATAAGCAACTTCTTGCTGACAGTGACCTGCGTAACCTTGTGGCGGCCAGCGGGTGAGCACGTGGTGTGTGGTCCGGTGAGTGCGGTGCGTGTACCACAATGCCGTAGCC )**,*($$$))'(1(&%'&%&+022,*(%$24*'')'&$%'/,#"$**234..*-(''%''*'(&)&$$(,,204*'((&%$#%%-,(&'*(&','&##"'-&$%%'*((%##'0&%%&(+)+,*)*-,)*-0,%&03('%%%&&&&()&$%&&&((,+)*()+%$%(&&%//-,+,&$#%*,&'''(@>>=52)'(&)'&$&*...5/-%)'&+')-1%%%&%)'##"&()%&&)&&&$''''$#&('(%*++-+%#%('&(%&'%(()*')#$$(/3,+'''%&%%&,)'''%$#'%"%'%)+(%)(''$%)&)+2'%(*)(&%&0('$&().'$$',%&'',-+&&))+*)%&&+/0,-+,,)*'&&&''%%%%((%%$&-313,+'''###%%$&()%&&)&%*16&%%(($&,,)')'(''(,-/&&,),.,(%%'*./,%&&+*#$%#''($%$%&$$&%$)&&%&'##$%'')%%#%'./**-42-&%&''& BC:Z:SQK-NBD114-24_barcode02 qs:i:6 du:f:1.914 ns:i:9570 ts:i:934 mx:i:3 ch:i:361 st:Z:2023-06-15T11:48:02.001+00:00 rn:i:14190 fn:Z:can_pod5_merge.pod5 sm:f:105.928 sd:f:28.1222 sv:Z:quantile dx:i:0 RG:Z:654d949156458929eb0d88decfead14747d14885_dna_r10.4.1_e8.2_400bps_fast@v4.2.0_SQK-NBD114-24_barcode02
12d8c732-2f64-4949-9f85-0a51b45afc59 4 * 0 0 * *0 0 TTCTCAGGCACTTCGCCGAATGGGTTCGATTTCATCAGCGTATAAATAACACTGGCTTCCAATCCGAGTAACTTACCTGCATATAATGGATACCCAAGAGCACGCATAGCAGATCTATTCGTTCAACGTCTGCAGGCTGCAGCATTTCTTCGTACAGGCACCGCCAGGTCGGCAAAATGATCCCGTATCTGATGATACAGCTATTTGACACAATACGGATGAACTGCATCAATTTTCAGCAGTAGGCGTCGCATCTTAAGCCGGCGCAGTCGCTGGTGCACGGTGCGAAGTCGTGGATCGCAGTCTGGCGCCTGATCCTGGAGGTGCTGCAGGTGATTGCGCCTGCTTAACATGGTAGGAGAGGCATTACTTCATGAACTTCAAAATCATGTGCAGCGCT ;2%$&'%&%+**1/00''%52*+59872101/*+,+.0-..-0$#%($%$#$#%$&-,(%#&')%#&',''$%%$'')'&#$%,,01'&&,%%%-+*+-+)'%$(+*(*+/2/--5//2,$$&&&&*&%#&$'$,*'+,'#%$%''1%%&)%%%&#'),(&%%().)&%"##&&%(&%%&(()+($%,$%$%'(''))&%&($%%+''''+&&(%*'(*+&$+((&(%&#$$%'%%%$$(&%(##%0((%%$###%(()*11)(,&&'+((''*'&&$#"$#&'++(&*$+(&&(''+'&'%%%&'',1,'.,(((1230*)./.*)'+)$$*(%$%'(,,-636333,-.*+%((('')%%$&$&&&%$$**'(%&**+.)'$+-&##$$%&+$+(&'& BC:Z:SQK-NBD114-24_barcode02 qs:i:6 du:f:1.5268 ns:i:7634 ts:i:898 mx:i:1 ch:i:365 st:Z:2023-06-15T11:47:53.569+00:00 rn:i:17171 fn:Z:can_pod5_merge.pod5 sm:f:97.325 sd:f:25.2987 sv:Z:quantile dx:i:0 RG:Z:654d949156458929eb0d88decfead14747d14885_dna_r10.4.1_e8.2_400bps_fast@v4.2.0_SQK-NBD114-24_barcode02
bb281324-0a75-4d6b-bb9c-293c09686042 4 * 0 0 * *0 0 TGATTAAACAATTTGTCTTCGGCGGCGAATGTGAAACGCCGGTGCGTAAGGCAAAGCGCCGGTGATTACCAGCATACGCGGCGCTGCGTGGTTATGGCCACAAGAGTAAAAACGTAGGCAATTGGCGCATCATCCTAATGCGACGCTTGCACGTCTTATCGGCCTACAAAGAGTGCCGGACCCGTAGGCCGGATCACCGCGTTCACGCCGCATCCGGCAATAAGTGCTCCGATGCCTGATGCGACGCTTGCCGCGTCTTATCAGGCCTGCAAAATGTCCCAGGACCGCGGTAGGGCGGATCGCGTTCACGCCGCATCCGAGCAATAAGTTAATGAGCGCGACTATAACCTTGCCGGTGGTTTCGCCAGCACCGGAGTATCCGCCGCTTGTAGCGCTATAGCGACCGTACAGGCGGGCGACGAGCAGCATCGCGCGGCGAACGGGAAGGAGCGAGGGCGGCAGGCAGCGCCACGTAATTATATGCCAGCCCCAGCGTCAGGCGTCGTACCTCCGCAATATGCGCGAGCAGGCGCTACCGGAGGCGACGCCTGCCACCTCCAGGCACCTAATAACTCTCCGCCTTTGGCATTTTGTAGTGACAAATCTGTAGCGGTGCTGAAAGGGCAAATAATCCCGCGCAACGAATAAAGCGAAAATAAGCGACATTGTTCGTGCACACAGAAAGGCGTCAGTGCCAGTACAATTATAAAGTCAGTCACTGCTGCAATGCGCCGTGGTGAATAACGTCCTGAAATCCTGCCACTTAGCGTATTTTTCCCAGCACCATCCCTAGCCCAACTAACATCATAATAAAGGTCATCGCCGTTTCCGAAAAACCGGAAATAAACGTCATGTATGGCTTTACGTGGCTGAACCGGCAAACACACTGCATTGCCAAACATCGTGGCGGCGAAAATTAACCACGGGGCCGGGCTACGCAAAAAGTGAAATTGTTCGCCGCGCAGATTTCCTTCGCCTCGTCGCGAATTGGCACCAAAAATGACCGTGCCATCACCGCAATATTAAAAACAGCGATCGATAAAAAGGTGTAACGCAGCTTGATTCCTGACTTAAATACATTTAAGCGGAATGCCCAGGCCGGATTGGCGACTGTCATCCCGGAAACCATCCCCGCCACGGCGGCGGTGACTTTTCCAGGGTTTGATAATTTTTATAACACGATCGCTCCGACGCCAAAAATACGCATGCGGAAAGCCGGATACCAGCCGACCAATGGCGAGCATCAGGTTAAGACGAAGAGAGCGTGAACATGGCGTTGCCAATCTAACGCACAACGCCACCAGAAACAACAAGATATGTTTAAGTAAGTAGCGGCTCAAAAGAGTGATCATTGGCGCACCAACCACCACCATGCATAATACGAGATCATATACCAGGCGGCGGAATCGAAATTC =11A?=BHA))+::0-*+.))*A:<=9:>77A>>?@;668502.23<5359,('22((0.-12'&')$%((')*'&(&),,145686441.1,+-5:<;/.+)),/1+-200/'###','&(7289;<11144.-.*-/<8102655?32/1.107<663(')40(((+05'&(*(*&$(&103351418/+&'%$$%&'.3032978<8(897=?:95464227566:=76+855;+,,.-('(144:7=@988@<<=9<;9;;4AS56459;6+*$$&)54.,520&/$004??5456**)+*+.,-*)-,++2150(%'''))%$%($.0)'(&,,('-401,-.561))/**)*'&()*2<>=;9<=9994:98=>BACA=;6B:2+**'')'21245466+++41342,,-,+-.&%'-%%'+,,)(%*')*)(+,)&&')'&$'($&+188:;<97:<*((,++72176<9622/00=?:93:9:*)(.-('''&()''&&'(*)-,-$%-%%%.+0%,+'')*+(((('(+)(%'%'%&$%&+,,1777F31)*+5313;333*))+)-*20//@=:&%&&('.;9:''('&)*(&)))%'(.-()1554.,//37;:+&')''*+1240/.'&'''.-)---.&&00/,)((('''&&%&*)%$$,3*($(*((+&'#%%(%$)'*662008H::<JC@?=>=A42358:99788>A;<=;612+('+,+(').69769;CAAAJESBEJ=?::<<;/..&%%1('2;;;>;9:--/B@;=DA5::?DCA@@ABB@@6440139::8>8/--)((3.-042.*)/15/-,-0788:,,76'&%./-107//1=:7&%%,%%%61/346+*577>602++%+*%%%0/04497.-.36?@A999326?<;7358A-,,2>FBC86A=6,-+/++/990/.+;;2,(()''/,))),33113<6582./.6;.-15220.31)'''-..%%0).-''&*&#'*)$%4234;;:=?67412:610**-())5++,&&0=?@FG?=:;=;9%((01''(*'&'&%&(20677;:(&'/%#%&001,+*''5*)'%(&'(#%&&(%$#$)746@<>?,-<=?>?7858;D=:>BBBEB;;<;;<:54578<11'''=:85,'/./07891%%1799<577=433AB<>><<;94=;;;(''0('#$%,3<=:00995-+-/)*+6:88+'%%3342200)*''&/)%$%''&(*+(+(2644459;734)*9335/.+*())&$&()(+4223245767;:6?;:556499=:43**+4'%%'%$%'$$$%&&)%$%,/1+.,+*,(%%+/.0.....)(+9<<?769:22;:@=A>99..6<=32222+*22116875((02.0-,54011 BC:Z:SQK-NBD114-24_barcode02 qs:i:10 du:f:3.8034 ns:i:19017 ts:i:1018 mx:i:3 ch:i:363 st:Z:2023-06-15T11:47:54.376+00:00 rn:i:289157 fn:Z:can_pod5_merge.pod5sm:f:105.054 sd:f:29.4775 sv:Z:quantile dx:i:0 RG:Z:654d949156458929eb0d88decfead14747d14885_dna_r10.4.1_e8.2_400bps_fast@v4.2.0_SQK-NBD114-24_barcode02
d3d13a47-f82c-44ec-863c-c14c9b50489a 4 * 0 0 * *0 0 TGGTCACTGTCCGAGGATACGCACCCCCCGCACTCATGCCGCGAATAAACTCCTTGTCGGTAATCGGTTTATCGACAGTCACCAGATACTTCTCATGATCATTGCCAGCACGCAGTCTTGTTCACCAGATCGCCGTGATTGGTCAGGAGAATCGACCCCTGGGAGTCTTTATCCAGGCGGCCGATCGGGAACACGCGTTGCTGTGGTTAACGAAATGACGGTGTATCGCGCTCGCCATCTTCGGTGGTTGCTGCAATACAACGGGCTTGTTCAGGGCGATAAGTACCAAATCTTCGGCTTGCCGAGGTTCCAATAAATTGACCATTACTTCACAACGTTGGCCGGGTTTCACCTGATCGCCAATGGTGGCTCGCTTGCCATTAAGGAACACATTGCCTTGCTGATCTAGCGATCCGCTTCGCGGCGTCAGCAAATTCCAGCTCGCTGATGTATTTATTTAAACGGACTGATGAGTTGGGCAGCATAAATTCTCCTGTAAAAAACGGAATATACCGTACCTTTGGGTTGATAAAAAATAGTCATGGCGGACTACTTGTTATTTAAGTGGCTTGAATTCGTTACCTGCTCAAATCACTTTCCAGAAGCGATGTGAGCATTCGTGAGCCATGTCGATAAGTCGTTAATTCGCAGTGGTGACATGCCTCCGTGATTTCTTACTAACCGGAGAAGTAATGGAACTGCTTTTACTCGTAACTCGACACTGCCCGGGTAAAGCCTGGCTGGAACATGCACTGCCGCTCATTGCTGGAACGGTTCCGGGTCGCCGCTCAGGCGGTGTTTATCCCTTTCGCTGGCGTAACGCAGACCTGGGATGATTACAGCAAAAC 8.;<;:8866**+++.+*&##&'((48=B767<;>>?<;:++)*.2)*,)0%&''''$$%453561124512+''(&%$*/13+*-3342511268001,**1*+/-99)))9''&1:7('(.3449;78?9<<:<:;=90//%%'22,/5,('(((BBD:<B=;<555ECHBD;:6/'''<.++%%&CB>=:<443;>@><=C:;8965342745'%%'))'+'(%%%&&%'''34*+4;6013720/-+*'&#%'((#&3.2766@977-,04DB(((*)(*&&'1/67469611075&&7447?=01-)-1006.-,./5((*)&&2/55334.-(&()438C:672+++6100=67::56@ABBA@>)&&+G87:.-.)(%&%*/.--01-.07;886&%&%%%**++,*)&#&0-//,-...+)'.%%+%1)&%$$$$5;RG@?>>666DCADE=;==.-,76313/.,1%%&;:510*(%/.+666>CGGB?CSI<B?547<:778;879:AA?@A-,76844466?>496****21(()(&(%-,+(+/;=:::>.+'(&'&)+()/742349278BH@=(--988889CDD@:55=.;(((67((*)*0)%%&),.4?::32)+''(*)(%%'/29975&&$(&,53$#$*/('(93846--.52*))478/02@;=AD<;;=9;?657B?>=559ED,'($$%')+/0***+&#$$%+'':9)&(*()5/..333511))'*)-,,,2434-++3632)'%.:-**,&''$591,+,0*-+,,,'878///:67101/---,,-+(++,,44/.+-.146%%/17+))+*)$&'##191 BC:Z:SQK-NBD114-24_barcode02 qs:i:10 du:f:2.3804 ns:i:11902 ts:i:1024 mx:i:4 ch:i:349 st:Z:2023-06-15T11:48:00.377+00:00 rn:i:8774 fn:Z:can_pod5_merge.pod5sm:f:95.0257 sd:f:26.9928 sv:Z:quantile dx:i:0 RG:Z:654d949156458929eb0d88decfead14747d14885_dna_r10.4.1_e8.2_400bps_fast@v4.2.0_SQK-NBD114-24_barcode02
And a few reads from the .fastq:
@0fb1396a-b875-4400-9f92-84abf01dde3c
CTTCGACGATGAACCGCCGCCGATGGAATGATGACTGCTGTGAGCGGCTTCATCGCCATTATCACGTGACTTGCATACCGTTTTCATGCAGCGGCAGCTTCTTCATTATGAAAACGCATGCCGTTGTCGCGCAGGGAAATCAGCTCATGCCGTCCCTGTGCCAGCAGCAGGACGCTTGCGCGCTCTTCTACGTCGCGGAATTGGAACGGAACACTGTCAATACGCGGTGGCAGCTACGTACCTGCAAGACCCATAATAATGGCGATATAGGTGTGGTGGCCTTACCCGTCAGCGACAGTGACGCCCTGGGACATCAGGCGGTATAAGCGAGTAACGCTATCCAGTAAGCCTTTTGACAGAATCATCGACGAACTGTCCTTCATAGGCCTGCGGTATGTGGGAAAATGGAACCAAATTACCACCTAGACGTGGATCCGCTGATCTGATAATGTCCTGACAAGGTGAACTGACTAGGTGACGAACTAATAACAGCGCATAGTGTAGAGGGGAACGCGGCGACTGGCTTAACTCGCATCGAATTAAACTAATCAAAGCCGCGATTTAACTAATATTGCGATGATTGTATCAAGTGGTGTGTAGATGTAAAGATAAGTCTCGTGGTTTCACACCAATTTGCCAGCGCTTCACCGACGCCTACGGTCGTCCCATACAAAAATACTGTTCGTCCAACGCCATGCCGATGAATCACGCGGCAGTGGTGCCTCCGAGGGTGATGACGACCAGATGTAATGCCTGGGCCAGCGGCATTAAACGCCACCAACTTAGTTCACTGGCGCGATACGGCAGATCAGGCGAGGATAATGCCGACCACTGGGTGTTGCCTGGTGACCAGTGACGCTATCGACGGGCGGCAGCACGCCAATAACTTCGCGGCAGGCGGAAGAGCGGTGCTCGCGACACCTCGGCTTCGCGCAGCGCGGCAGCGATGGCTCCGATGCGCTGCGCGAACTTGAGGAGGTGCCTTTATACCGCCTTGCCGCCGTTTGAAGTTATCGGCGTGCTGCCGGCCCGTCGATCACGTCACTTTGAGCCTTAGGGTAACCCGGTGGCGGCTCCGCCGATCTGCATATCACCGTGAGTGAAGTCTGGCGTGTTCGATCCCGCTCGCCGGCGTTACATTCTGCGGTGGTATCACCTTTGATCTGCCGCCGTGACGTCGCATCAGGGTTGTGGCTGTCCTGGTACGAACAGTCTTTGTATGGGAATGACCGCGGCGACGTCGTCAGCTGGCGGCTGCAAGATTGGTATGAAACCCTAACTGGTACCTGTTACATCTCAAAACGCTACTAGAACAATCGTCCGATCGGCTCGATGGCGGCTTTTGCAACATAATTGATCTGGATGACGCCGTCGCGCCGCGTTCAGCCTTACACTATGCGCTGCTTATAACCGTTCTTTGGAAGTCCGTCACCTTGTAGGACTATTATCATAATTAGTCTGTCGACATGTTTAAGGTGGGGTGACCCTCATCTGCCGGCGTGAGCCTATGAAGGTCGACAACGTCGATCATCTGGTCGAAAAAGCTTACTGGATAGCGTCTCACGTTGCATCAACGTTATCGTTCACTCGCTGAGCGGTAAAGGCCAGCCGCACCGATATCCGCCATACGCTTATGGACAGCTGGGGCAGCCTGCCGCAACGGATAATGGACGTGCTCCCGGTTTATTCGCAACGTAAAGCGCGAAGCGCCTGCTGGGACGGCATGGGAGTGGATTTCCGGCACGACGACGGGATGCGTTTCATAACGGCAACCTACGGTTATGCAAAATCCACGCCTAACGGCAATGAATCATCTACAGCAAAACGCCCGGCGGCGTTTATCATTGGAGAGTATCAACTGCAACTGATCGTGTTAAACTAGCCGATAAGT
+
551*./3((()'20)%$&%$&'%%('(*'&$$#$$&$)&(.%%&%$""%,''3../6&&(33-$++/-+&%''().,-/048;>/,,,0&('+,-+)')+'*%$$$&%&&-/031'%%-'%,,-047%%%+-%''5:3-((''((##((-((%(-.00623)%&,)(+*+(&%%%$('%$')6839:661-'&'%%),-&%''%&.,%*..+(')'%#"#&'$%&%((+**%%%(&$$'$'(2++**-+''3-**01(()+...(('(%'+//28;<7334:7/34811*((,.11(''0+)'$$%#$'()$&'%##%&&'&%$#$$%%$&-.47../%',,+,.--/1..2,,&&*+),0(,*+''*))+3.0,'$'(,-//$$(,/0&%$())(%('.01/+()&$$&'''%%$##%++))4((*(%%##$''#""$#$%$#""$'&&$$%.31140+,54(())0+'''#%''$$$)&(.2/13555,-.))&($$%-(#"#&)7;=75**+./112;02267**--#$&'-%%%0*))54++,46-+*:2&(//(()012447..2,,,5?78:434710'&#$%'-%%%&'&2../...%$+.+*$###(($#%$(*)))++,55044=A43&),0.0%'%++('%$#"',%%$***'*&1850.1)*/5/-,/..,,'%'%%+'(%%'*)&(*,+/*,0/*&&#%$)'(&&''(&%%(%&$%$/,%$#$$$%%%&*(%&4595:;8:;9+-69')(-'''('%#"-1+'(&%%&%#%&$"#%'&%&**,%%%'')''2:6455.((-)'%%/.%##%%%/../3&%'*(+*&&&%$#(*+.&&&)**+)(102221/++($$45%%%.//.+)02)),32230*&%%&#$%(*()).0&&'/20&$##&&'&%'''&&''(3('),&202368,,-))),++)()2%%$(..5498::/0))-+-'&)()'*(&%'(,3438-,,50492$$*65'-'--&%'%%''''014..*+*1;60&(45/--))'%###&&%$$'$$(&'(,'$#&-)''(,2&%'+)&%$%%%'*()(,+('''$##'$##%((%&&%)(,'(&%%'((#$%$(&&1./4/-.&&&2'%(.,)%&$$$#0)&#('&%&%&&&&',.*$$%&&$$'..'+&('##%'%$#$#$#$-)%$%$%&+&$%-1*&'$$#&*'%$%#$(.'(''$%++&($$$()(#%$%##$##&'%$&(11..(''-)($$$%$$#%%%%$%'-((*,&%%('%####%%#$&&%&'-$$)*0)'*,&%'$$'(-))+)&'((%%&%#%%)%%&&###%%$$&(+(&$$$"""#$*%$##&%%&&%####,''--,/48)))()%##$&'*(,'%&''%,,-.''$$##&()&%%(&&,)'&(.'&%&'%%##%10))##'($$%&''*($$#$%'%##$&##$((+&'%*)+('***$#%%%'%$$''%#%&'%&'&#$$&#$%&*/'((%+()('')+'$##'(*(*($(-/-+,'$$-41+*%)1'**)/0/0++%%(,)(('./6521%$$&)'(%%$%$)*(*,&&'''%&%&)())*&&&,&%&..+(((4+*'()(%#'%#"#$%&($'%%&(%#&+%%&&####$$%'((&%$$&%&%$$%$$%(%%##$$(2+&$',&'',/-%%&)'%%%&'&((.3-*&$'()'&%$-)('()&$#'%'$%%'('*-&(+('))%&)**+.,**.0/++.6+**,+-,-.''()(%%#(%+'&&)&&/&*//,,)**+&''%&&+*+%$*)%&*()(,*+'$$(&$#$'-)((&%%&$%%$#$#$$()*,(*'$$###$$#&###%$##&%%%))'''%%$$$'&%'%$#$
@600c63d9-ee9d-4529-99bb-797c013f9baf
TCACTTGGCCTTCAGCAGAAACCTGAAAAACCACACCACCATTTTCTCCGAGTGGTATGCCTGATATCATGAAGGAGCCGGGATGGCGCGGTGGTGGGTTGTACCGCACTGTGGTCGAAACCGGTGAAGTGGTTTATTTTCAAGCCCGCGCTACCGTGCTGGCAACTGGCGAGGGCGGGCTGTTGCTGGTCCCACCACCGACGCCGCGTTTGACACCAGCGACGGTCGGCATGGCTATCCGTGCCGGCGTACCGTGCCGGGTATGGAAATGTGGCAGTTCCACCGACCGGCATTGCCGGTGCGGGCGTACTGGTCACCGAAAGCGTGCCGTGGTGAAGGCGGTTATCTGCTGAACAAACGTGGCGAACGTTTTATAGGAGCGTTATGCGCAAACGCGGGAAGAACCTGGCGGGCCGTGACGTGGTTGCGGCTCATCATGATCGAAATCCGTGAAGGTCGCGGCTGTGATAGTACGTGGGGGCCACGCCCAGAAACTGAAACTCGATCACCTGGATTAAAGAAGTTCTCAAGATCCGTCTGCCGGGTATCCTGGCGAGCTTTACCGTACCTTCGCTCGCGTCGATCCGGTGAAAGAGCCGATTCCGGTTATCCCAACCTGTCACTACATGATGGGCGGTATTCCAACCAAAGTTTACCGGTCAGGCACTGACTGTGAATGAGAAAGGCGAAGATGTGGTTGTTCCGGGACTGTTGCCGTTGGTGAAATCGCTGTCTATCGGTACGCGGCGCTAACCGTATGGGCGGCAACTCGCTGCTGGACC
+
2,$$%5*$%%&'((+(53151056''*:??;7776:=788711;<87'</--:87832*)$$''-++'%%&,,*&(&%'$$()('''*+&)3365008344S?>B@>223>=)$$'+*1552469><<;;54245,,--3(&'*+2<(('10'&'+432,*(&$#&((45+(*-*+-,('%%'&%,-,**'.1''(('''&'65/+*)-,3-.1&&&)('*&''*++,2=5426465<>?>>543-*+,+%&()&%$('(+))('-68<832+*&))*537,-.0/138:?<99:<:0348812:<++&')*++(&&&.,/,-$%/''&'+467::;>?>=?>>>H;:8::**22'%))%'*2:%%8?8667923&'+0,+('(22)0/4#%)),+&&&'+1,-'-8<?@11CG:**,0412--.0-($$#$$%()*-.//)((4==644AB><<::40.)1;H:9654('+&$#%2110/.1/.)))++)'784926C8//)(*6.,+,=)((*3**)*&.*%&*.,)')((.5))(.*++11B@65788('&&'(&',-%%(3.))*)1(138.))*6+++:8<;>@@BIES>:<958662;679=:<9:=//7<?><.8CB@B444@@98FH77642113&&&0.(()('('',/6<99<C?424==<>653)3.'%%'258===;BCD>:3.54710-++()12/4/.,*),3335::77<76&$$%$%$'(0'',3..%%(('**+''(-))&('-DA<<;1028510&.01''',,
@6aafef91-8d14-4d55-a0bc-c6839a8d9fd2
TGTTGAAAGATGGGATCTGCCGGATTCAATCGGCTCTCGCAAGGCATCCGTGTAGCCGTTCAATTCAGGTAGCTCATCAAGAAAAAGCACACCATTATGCGCCAGCGAAATTTCACCGGGCCCTGGAATTGCGCCACCGCCTTCATCGCAGATTAACGATTGCACTGTGATGGTCAGCGGAACGGGCGCTGCCGCCATTGTTTTTGTACTGATTCAGCATTACCAACTTCATATCGCAGCACTCTAAGTGCCTCTTCATTGCTTAAATCTGGCAATCGGCGTGATACCGGCTGGCGGCATTGCTCTTACCTGTTCCGGCGGCCCAATAGGTAAAAGGGTACAGCCACAGCGGATAATTTCCAGTCCTGGCTTTCCTTGTTACCTAACGATAACATCACTGAGATCATCTTGTAGCACCGGATACTGCATAGAAACCCATTTCGGCATTGAGGAGGCATACCTCCAGAAACACAAGAACAACTTGCAGATAATCGCTATAGGCATACCTCCGCCGTTTAATTAGCCCCACTTCATCTTCGTTATCTTCACCGACGATAATTTTTCTGCCCCGACTTAATAGCTTCAGTTGCACTGGAGATTGCACCGGGAACGCCGCCGAGCGCTGTTAAGCGCCAGTTCTCCGACTAATTCATATTCATCTAACTTATTGGCTGTAAGCTGTTCTGAGGCCGCCAGCAACGCAATGGCGATGGGTGAATCATATCGTACCCTCTTGGCAGATGGGCTGAACCAGGTTGATGGTGATTTTTTTCGCCGGATATTCATATCCATTTGATAATGGCGCTGCGCACGCGATGCGCAAGCTTCTTCATCTGGTAAGCCCACCATCGTTAAGCCGGTAAGCCTTTACTGATATGTACCTCAACAGTGATGGGGGCGCATTATTCCAGGGCTGCGCGGTTATGAACAATTGACAGTGACATAAGCCCTCCTCGTCACCATTATGTGCATAAGGATCTCGCTGCTGTAGCCCGCTAATTCGTGAATTTTAGTGGCTCATTCCTGTTTATTTGTGCAAGTGAAGTTCGTGTTCTGGCGGTGGAATGATGCTCGCAAAATAACGACAAAGGATCAACTACAAGGAACAACATAATTCTGAAAATAAATTTTTTCCACTTCACTTATTTATTTTTAAAAAACAACAATTTATATTGAAATTATTAAAACACGTTCATAAAAATCGGCCAAAAAATATCTTGTACTGTACAAAACCTATGGTAACTCTAGGCATTCCTAGAACAAGTGCAAGAAAAAGCAAAATGACAGCCCTTCTACGAGTGATTAGCCTGGTCGTGATTAGCGTGGTGGTGATTATATCCACCGTGCGGGGCTGGCTGGACGAGGAAAGGCTTAAGATCAAGCCTAAGCGACTAGAGCCCGCACCGAAAGGTGCAGATTTTGACCTTAAAAGCATAACCGAGAGCAGACAATGAATAACAGCACAAATTCTGTTTCTCAGTTCAGGAGCGGGGAACTAACTATGAATGGCGCACAGTGGGTAGTGTCAGTGCGTTACGACGGGTATAAACACGTTCGGTTATCCGGGTGGCGCAATTTGTGCCGGTTACGATGCATTGTATGACGGCGGCGTGAGCACTTGCTATGCCGACATGAGCGGGTGCGGCAATGGCGGCTATGGGTTATGCTCGTGCTACCGGCAAAACTGGCATATATACGCCACGTCTCGTCCCGGCGCAACCAACCTGATAACCGGGCTTGCGGACACACTGTTAGATTCAGTCCCTCTTGTTGCCATCACCGGTCAAGTGTCCGCACCGTTTATCGGCACTGACGCATTTCAGGAAGTGGATGTCCTCCGGTGTTGGTGCACCTCTACCAGCACAGCTTTCTGGTGCAGTCGCTGGAAGGTTGCCGCGCATCATGGCTGAAGCATTCGACGTTGCCTGCTCAGGTCGTCCTAAGTCCAGGTTCTGGTCGATATCCCGAAAAGATATCCAGTTAGCCAGCGGTGACCTGAAAGCGTGGTTCACCACCGTTGAAAACGAAGTGACTTCCCACATGCCGAAGTTGAGCAAGCGCGCCAGATGCTGGCAAAAGCGCAAAAACCGATGCTCTGCGTTGGCGGTGGCGTGGGTTGTCGGCAGTTCCGGCTTTCGTGAATTTCTCGCTGCCACAAAAATGCCTGCCACCTGTACATCTGAAGGTCCGACGCGGTCGAAGCGGTTCTCCGTTACTATCTGGGCATGCTGGGGATGCACGGCACCGAAGCGGCAAACCGCGGTGCAGGAGTGTGACCTGCTGATCGCCGTGGGCACACGTTTTGATGACCGGTAACCGGCAAACTGAACACCTTCGCGCCACACGCCAGTGTTATCCGTATGGATATCGACCCGGCAGAAATGAACAAGCTGCGTCAGGCACATGGCATTACAAGGTGATTTAAATGCTCTGTTACCAGCATTACAGCAGCCGTTAAATCAATGACTGGCAGCAACACTCCGCGCAGCTGCGTGATGAAGAACATCCTGGCGTTCGACCATCCCGGTGACGCTATCTACGCGCCGTTGTTGTAAAACAACTGTAGGGATCGTAAACCTGCAGGATTGCGTCGTGACCACGGTGTGGGGCGGCACCAGATGTGGCTGCGCAGCGCATCGCCCACACTCGCCGGGAAAATTTCATCACCTCCAGCGGTTTAGGGTCCATGGGTTTTGGTTACCAGGCGGCGGTTGGCGCACCAGTCGCGACCGAACGATACCGTTGTCTGTATCTCCGGTGACAGCTCTTCATGATGAATGTGCAAGAGCTGGGCACCGTAAAACGCAAGCAGTTACCGTTGAAAATCG
+
+,,)&1900.-''))'&'%%&''3:;2../0,-2B21&$%&''0/)&'2+(+))%&+++)())-0.+039,+,,,*+*,,3-<3999:22%&&&&'.88;455244/-17459;;>@33<@DSA@6;952(''),,14799D9'&&&,-)))(##$$&())'('*+7://0++6(''-(&'+'*1==***763..-%$%%&&)-852211---.()*+'&(&'+-++/1--.&&&**')))&&)+%"#&&$$%%,.'7:?:31-&))0..%%&$%%$"$%"#%((%%(',-96440(&&,*''$$%)'&#&%&'''/03***A=-+/*'1*..0765/*&%#&%+.)#&&)0+(+2222,++'+)16%%%'0/--,0(())().'&$#%%&'&$##%&(%$&(..-.(('##$&&&+,0031.-*'&$'%%%)('''%'*-'$&0('''&+(&'(%%%#%$%#$%'%,,'$&&(&&)))%$%/-,,./''(*&$#.**&(.///)(&)#$##"##(.-0,,*,-61898762456;3-++.,,,.34)(&%'0168336<==C12054-=5313*&*-0./75025.;9=10189:3++(+-&&'*.33.///*&+,$$'**-+-/+)0(&'23-,44155>9::>9==/-430201545;:>>BD<;:?GABAA<9;?=/./2523;??G6340.-./000/35;<=999+*3;''33*%&'(&&&(-.25+74.DB;9/-$$+.<.%-+0((/:=:;<435557:;;BB:-,,922:C9:92/.)*('&$%(%%%%(*259///897989*+*32+)&&'))*+,)*.%$#"%%)(&$&%+8::;//(&&4//062-783*3**9:99;::;=<<=@>>;::<>@867544))9A77))%&,)%%'**26BBD775:6<?=;=((11)),(((3...957++&'(11;=206*(()5676123413'&'+'(&'''&'&+1::99;>976@B=:>99;>*)),+0.-10&%%*,/9*)*+-4424368;AC<><C?@AFB@=<00&%&,)))*).688:8006-534697+((%%)/*%&$$&'*4;9202%$%*/'($%%#$',,--))%$&(#$,.393/'(61037691*%'(&%)+*.&'(.//2442)(.9;70+08../<</,(44458>95&%(/,.32//%%*+/0,6;>>=)(-255.36634))''(&++21'%%&&27324620345/,))(0&46>///457&&&340,,&'32,/7915,+'&$+.,&%%#$('-.('358677;8687755;<588500./0.+,-248982,,+,,%%%%((59<;62.-&'())-'"#+&*++**+24004803+&%&./'%%'-10)3*'&&(-'$(*&'>51+**,,26754**'&&%&*+''(/52).153))0+,-10+6(14459:112/,-.''(*(()4./2./.%%%%.))%$#$'%)*0*,&&.:@@<;:689?::''*)*)+,-&%#%%,-41$&&,.(%&$/5,--+*&$$(+78,)&'.2((%&(*%&(+((%%#&*(''%'&'%%*-'''335323.'%&)/0*+*+5,,,98.--0./0.2).0056<6E>==CDB43366,+*+'*8622,-/0/1((*88&&&2''=<2+*+22-*+/41--,,.5429735)..&$$&#$$'(/1752723/01((+%%*)'&%'2&&+;@8>@<=>445@>>@/.'(')&)()*5.(%'/.*($&*1;?=))*1226?AA?<:;9:>;;667I=?:B21/120+2700-+,4666654/.*+,3&&+204.018:864../'%%($#%$%)*('&'';(().+*%/0/*2+*;;:=++7II:;;1.)()(*141)-*)*1203/1179'&'B75965/003%%%)((%%%&(--)((,9866012$#%&&&&))0766/015'&%//''.*(1A<<;5/78>::87.--,%&0557876442(3)(%$*,0574-56>=:23::2@@?954>;<222:73=A;;;??CD97CB@7+**&&()3)('1))++1244.,-45@:72))%+43/085230-,$%#$#+.,6>ABD@??@?::@4/*+((*.'%&)'*-/9@AA'',.-1/..03)**097:000;AB@@;:<?DD?322C:70+#""$$$*(&#$('%&%%$&))&&+,,&&$'%'#$'*+&%'*,'(*,IE632668<S@;:41./,.2)()(&2+'',-*)6)),5,,5<>8;;:=<><?>>?B@AA6<;5//266((++*(&&&'+*++876::117:&%'87).((9115;??666C1+01789>>;?:::0//<599D54)()669>><<21135B<<<;<@BCD:<@@CCCCDAA=>??=?>85..)00%$$&')217?656;<;=?@?>>:<<><9;=3356333203;995@>;?>>??;8:=112<69E?>?))+++.'''63=8>>N@=880//31/..+%'&$56..100.'&+5898554)()677458,,+,+/(''///..021,+)27BA**,80.%%&,+*($$%'*'(.((%&'&+*))*1100323@97*&%)/35:;<)()(''165569,-2A?=67,/0)**20)()5*)&'*&&&4**164430-..38>;==<1979;7763/+*,58()+*)->43>C6'(0++(&')<<<=><<4116;:))())//(-.12644751))--.,,+(*'((-*++/.0.19566$$%3&2)'3<778??BJ@>>?16998560068)'(-228==6-,-9;?999<;<833766;:96'%
Seems like there aren't that many reads missing, but I'm still interested in why.
This is the read splitting, now I am more sure. Can you please get one of the read IDs in in_bam_not_inblow5.list and do a grep as:
samtools view calls.bam | grep "readid"
and see if this "pi" auxiliary tag for parent read ID is present for that read?
If so, we need to locate the parent read IDs for the split reads. For Guppy, the strategy I used is explained under demultiplexing here. You can check the FASTQ generated by Dorado for those split reads to see if they provide a similar tag?
Otherwise, I can think of a strategy to extract this through the BAM file using some bash commands, which would require us to use the BAM file tags.
Another solution is using buttery-eel wrapper on top of the ONT's Dorado basecall server. ONT's Dorado basecall server can be downloaded from the ONT software page (this is the version of Dorado they use in MinKNOW for live basecalling). Then, you can buttery-eel on the BLOW5 directly like:
buttery-eel -g /path/to/ont-dorado/bin --config dna_r10.4.1_e8.2_400bps_5khz_fast.cfg --device 'cuda:all' -i can_pod5_merge.blow5 -o reads.fastq --port 5555 --use_tcp --barcode_kits SQK-NBD114-24 --do_read_splitting
This will create a file called barcode_summary.txt
which has a proper format like here, and the parent_read_id
and barcode_arrangement
columns can be used to easily generate the read_ids.list
to be fed to slow5tools.
Yes, the pi tag is there. Sems like it was the read splitting. I'll follow the workflow to grab the parent read ids as described under the demultiplexing strategy. I'll make sure to re-open if anything comes up. Thank you for the great tool!
Greetings,
I'm trying to demux a pod5 with skipped reads. I merged the pod5 and basecalled it using dorado and demux'd it the bam into fastq files. I convert the merged pod5 with blue-crab into a blow5:
I extract the read ids from the barcode02.fastq:
Afterwards, I try to extract from the blow5 and get the following:
It manages to extract some reads, but stops at this particular read, wondering what could be the cause of a read getting lost. Any help is appreciated!
EDIT: it seems that every 4096 read batch, it extracts less and less reads, is there any way to avoid this? Why are the reads missing?