isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

Empty Output file when using Racon #221

Open an22an opened 1 year ago

an22an commented 1 year ago

Hello, i have ann issue with an empty output after polishing with racon. When i run racon on an assembly and a polishing file i get a empty output although it seems like the tool finishes without an error.

To get the overlaps i used minimap2 with these commands:

root@3f197b9f145e:/data/polishing/Racon/EC_ONT_Racon# minimap2 -ax map-ont Ecoli_R.fna SRR8494940.fastq.gz > aln_EC.sam

[M::mm_idx_gen::0.1890.83] collected minimizers [M::mm_idx_gen::0.2271.17] sorted minimizers [M::main::0.2281.17] loaded/built the index for 1 target sequence(s) [M::mm_mapopt_update::0.2441.16] mid_occ = 10 [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1 [M::mm_idx_stat::0.2581.15] distinct minimizers: 932055 (97.19% are singletons); average occurrences: 1.044; average spacing: 5.353; total length: 5206604 [M::worker_pipeline::0.5001.70] mapped 113 sequences [M::main] Version: 2.24-r1122 [M::main] CMD: minimap2 -ax map-ont Ecoli_R.fna SRR8494940.fastq.gz [M::main] Real time: 0.511 sec; CPU: 0.859 sec; Peak RSS: 0.178 GB

And for polishing i run racon with following commands:

root@3f197b9f145e:/data/polishing/Racon/EC_ONT_Racon# racon -t 5 SRR8494940.fastq.gz aln_EC.sam Ecoli_R.fna > polished_assembly_EC.fasta

[racon::Polisher::initialize] loaded target sequences 0.055208 s [racon::Polisher::initialize] loaded sequences 0.022606 s [racon::Polisher::initialize] loaded overlaps 0.013794 s [racon::Polisher::initialize] aligning overlaps [====================] 0.004770 s [racon::Polisher::initialize] transformed data into windows 0.002734 s [racon::Polisher::polish] generating consensus [====================] 0.042274 s [racon::Polisher::] total = 0.142599 s

What could be the reason for the output to be empty? When i use -u the file contains the unpolished target sequences as it should. Thank you in advance for your answer! BG

rvaser commented 1 year ago

Hello, might be you have insufficient coverage for Racon to polish anything. How many reads are in the fastq file? Only 113 seem to have an alignment with your target file.

Best regards, Robert

an22an commented 1 year ago

Hello, might be you have insufficient coverage for Racon to polish anything. How many reads are in the fastq file? Only 113 seem to have an alignment with your target file.

Best regards, Robert

Thank you for the answer! When i run: awk '{s++}END{print s/4}' SRR8494940.fastq i get 113 as a result which measn the mapping was correct, right?

rvaser commented 1 year ago

The dataset you are using has 158590 records, which makes the 113 alignments insufficient for polishing. The dataset has been uploaded at the beginning of 2019 and maybe has too low accuracy for minimap2? Try running minimap1.

an22an commented 1 year ago

The dataset you are using has 158590 records, which makes the 113 alignments insufficient for polishing. The dataset has been uploaded at the beginning of 2019 and maybe has too low accuracy for minimap2? Try running minimap1.

I understand, thank you for pointing that out i didn't notice. I tried using this fastq dataset https://www.ebi.ac.uk/ena/browser/view/SRR8494940 with minimap and then racon. Basically with the same commands as before but i got following error:

root@4cdaf30ae520:/data/polishing/EscherichiaColi/Racon/EC_ONT_Racon# racon -q 10 -t 5 SRR8494940_1.fastq map_minimap.sam Ecoli_R.fna > polished_minimap_1.fastq [racon::Polisher::initialize] loaded target sequences 0.054086 s terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::FastqParser] error: invalid file format Aborted

The head of this file looks like that (i tried to ignore the quality score by using th e-q option):

root@4cdaf30ae520:/data/polishing/EscherichiaColi/Racon/EC_ONT_Racon# head -n8 SRR8494940_1.fastq
@SRR8494940.1 1/1
AAATTGCTTCGTTCGGTTGCATTGTTGCTAAGGTTCAGACTATAACTCTGCCTTTAGAACAACTTTCCTACCGACAGTTTTACCAGCCAAAAGCTGCCGGTGCTTCGGTATGACGCCTCTAACCGTGGAACCGTAGCCTCAATGGCGAAGTAATACCTGTCTGAACGCGAAAACCGCGCCGATGATTGCCGGTAAAGCGACTGACGAAAGGATCACCTATTGTGCAGCAAGTGCTGGAACTGATCCTGGTGCAGCCGATGCTATTAAAACAGCCTGGCGCTCGACTCGCGTATATTCTGAAAAAGATTAAGCCGGTGTAAACGCTGCACAACCGCAATACCAGGCAAGCCGCGTTTGTGGTGTTGAAATCGCCGTCGGTTCAAGTCGGTATGGTGGAAATATCGTTGTCACCAGCCCGGAAGAAGAGCGGCTGTTAGGCACTGCAGCGTTTCGCCAGAAAATCGCCTGGCGATTGCCGAGAGCATGGTAATCGGATTCATTTCCACTGGTTTGACAGCAGAAGCACGCTCGGCAGTGGTAATTAGAGCCGACGCGCACCGGGTTAGGCGGTTTCAGCAAGCCTTCGGGATGCATTCAGCGGTTGGGCGCAGGTCGTCGGCGCAGAATTTGTCGAAGACGCTAACCGCGCAAGCAACGGCAGCAGGCACTCGATTTTTACGCATAATAGTGGTGTTTTCTGAACAAACGAGCGTCAACTTTATCTCCGGTGAGGCGTCACCTGCTTCCGCCACCGCTCATCGTCCAGGAACACCGGGCGCAGTTTCAGGCGATGAGGCGTTTCGTGGTACATCCACATAATCCGTATCCCCACCAACCACACCGAATATACACTTTTTTTATTGCCGAAAAGCCGGGTGCAAGGGCCGTCTGATAATAGTTTGGCGCTGGCTTCGATTTAACCTTTCTATGGTTTTGAAAGACACTGCGCCATTCCTCTTCCGCACCGCCGTGACCCTGTGCCTGTTTGGCGAAGACGTTTGCATCCGCGTTTTAAAAAGAGTGAAGTTGCGACGAATACTTCTACCTCAAACATCATAAACGAACAGCGCGGCATTGGCGGGCTGTTCGGCATGATGATCCTGAACACGCCAGATTTCGACCACTGTTTTACGCCTTTGTATGCGGGCATTGGGCAAAGACTACACCGACGCTTATTACCAGTGTAGAGCGACGAAACGATGGCCTACGGCGAGCGCGAGCGCATTCAACTCTACCGTCGCGGTCGTTATGTCGATTCAAATCGGTCTGGGATCGCGGCACGCTGTTTGGCCTACGAAGCGGCGGGCGCACCAGTCGATCCTGATGTCAATACCGCCACTGGTACGCTGGGGTATGATTATCAGCAAAAGATGGCAGCCCGAAGCGGCTTTAAGTGGGACTAGTAAAGGTCCGGGGATTGGGTAACTCCCTCACCCCCGCATCCGCTGATGCAACGTCAGTGACAGCTCTCGGCTTAAACAGTTACTGATAATCCATTAACCGTGCCCAAAGATGCCAATCCCCTGCTGCATGGCAGCGTCTTTTTACCGTTGTGCTTTGTGACCACGGACATGAGTTCGCGGCGCACGGCGTTCAGGCAATACGTTTTCAGCCCACGCGTTCCGGACCAATGCCCAGAAATGGCGTGAAACGCGATTTTTTGTAGCGTGCGGCGGCTGACATGCAGTTGATTACCAGTCCAGCACCGTCACCGGACTGGACATGTTTTCCAACACACATATTACAGGCGCGGGAAATAATCGGCAGTAACTCCTGATGACTGATACTCCCGCCATCATCATCAACTGCACTTCTTCAACATATCCACTTATCGCCATTAACAAAGTATCCCCCTAACACTTTTCACACTTGCGGTTTGGTTGCCGGGTTCATGAGCGTCGCCGGCGCAGTGGACGAAGCCTGGCGCGACATGTAACAACAAGCCGGCCGTTTTGACTGCGGCATGTACTGACGCGTCCGGATTGTGCGAAGTTGGCTGTTGATGACGTCTTTCAGAAGCTACGCCAAGGATCGTCGTCCGGCGTACCAAATTCAAACTCAGTTTCCACCAGGCGCGTCGCAATTTCTGCGCCGAAGACATTTGCGAACCGATGAGCCACTGCTCACCGCGCGTCGCCGGAATGCCAAACCAGGGTGGGTTCGGCGGGCCGGGCACGACTGACGCAGCGCCGAAACCAGTGTATTCACGAAAAACCTGAATATCATCAGGTAAGAATTTCCGTAAATTCACCCATGAAACTTGCCCGGATGCGGCTGATCGTAAATCTGCTGCGGAGCGAAATAGTATCCACTTATGTTCATAGACGTCCGTTGTCTATCGTTGATGGTACCGTATCCACTCGACCTTCAGCATGAAACTCAAGTTTTCAGGCAAAGTTCATAATAAAATGAAATTACAAATTAGCTGTACGGGTCTTTTTTCATGATGTTTAATGCCGAGGTGTTGTAGGACACCCGGCACCTCAACTGAGTTAATGGGGCTTGAGACGGTAACGACTACTGCGTTTACGTAACGTCCCGGCAGAAAGAAGAGCTGTTCGAGCGCATTTTCTGGCTTTTTTTCAACGACCAGCCAAGAGTGGACGGCCAAAAGCTGTCGCGGGTCGTTTCACGGGTGCTGGGTCGTTGGCGGGCGTATAACTCGTCAGCAGATTCCGCGACGATAACTGGCGCGATCGGGTTCAACAGGGTTGCTTCGGCTGGCAGTTAAAAAACCAGTAAGCAGCCGTTCAAGATGATCATCGTCTGAAAGCCAGGCCGATCTCGCCCGACTGATGTACACCAGGTACCATCACTGACGCTGGCTCTCGTCCAGCGCCGCACGACACGCCGCCAGATCACCCTTCCACCAGTTCTGCACTGCACTGCACCACACACACACTACCTACCACCACTGCCACTGGATGGGATCAAACTGGGACAGGTCGTTAGCTGCTTTGAGCATGGCGCATCGGCTGCATCTATCGCGGCGACCATGCCGTCCGTCTATGTCCCAGTGCGCGTGATCGGCAACCTCAGGTGCGCACGCTGGTTGGATTACGCGCGATTTCAACTTTCCGTCGGTAAAGGCGTTACAGGCTGCCGCTTTACGCGGCGCAGCTTCTGGGGTAAAAAATGCTGCCGAGTAGTTGGTTTCAGACGGTGGCGGGACATAGGTCTGCGCTGCACGTCGGCAGATCTAACGCTGCATCAATGCCGTAGGTCGCTTCAGCGGCGTGCCTGGGCGTCGCCATCGGATGTGCGGCGATCGAAGCGTTGATGAAGATAAGAACCGGTACGCGAAACTACATGTGCAGAGGGAAAGGCTGTGTTTTTCTGCGTCATTAATACGTAGCGCCATTTTCAATATTTGCAACTTGCATCGCATCCAGACCAGCACGCGCTTCCGCCGGGTTTGGTCACCAAACAATCGCTTTCACGGTCGGTGACGGGCGTGTGCAGCGCCCGCGTACGGCACGACGACCATATGCCACTTCGTTGTCATAGCGGTGCTGATCATGCAATAGCGGGTGGGCATCATCTGAATCAGCAGAAATCGGCGGGGCTACCGGTATGCGGCGAAAGCAATTTCATTCACATTAAAATTCCGGCGTTAACAGAGCAATCACCACATAGCGGACCTGGAAAAAATCTAAAAAAGCCAGCATGATGCCTCCCTTATCGGGTCATGTTGATGCCGGATGCTTTCTGCTCCAGCATACGTTTCGCCAAATCCACAATGACGGCTGCGGCTTCAACCGGCGTGCCGCCTGGTGAGTATTAAAGAGATACAGGTGCGATCGACTCAACCCCGGTGAATAACCGCGGCGGTAAACGGCGTAACGGAGAAGCTTTCTGGCTACTAGCGCTTGGACGTTCCGCCCTAACGGGATCTACTGCGCCCTGGATCTCACGCAAATCGATCTTCAATCTTCCACACGGACCGCGCTTGTGCAAAGAATGGCGTGCCCGGGCTTATTACGGCAGCTTCGGTTGCCATCGGCGGCGGCGGGATCTCTTCGTAGTTCACGTTGATCGCGTCGGTTGGTGTCAGAAGCAGCGTCCGGGATTGGCATGCTGCTGGTAACGGCGCTGGCGGCTGCTGCGCTAAACGCGGCAGCCCATATCCGGACAGATCAGGCAGAAGATTTTTATCGCCTGATTTCAGAACGGACTTCAACAAACTGCGCTTTTCTGCATAACTTCGGTACTTCTTTCAGGAGGGCAAGTATCTTCAGACAAGGTAGTCGGCGGTACCAGCAGCTTGCGCACGGGTGCGCAGGACGCGAACCGGCACGATAGCCGAAAGTACAAACGCAGGCCACGGTGCTGCGACGCAGTTCTGTTATACGTCCGCGCGATGTGGGTTTTCCACCAATCACACTTTGCTTCCGCGAACCTAAATCAACCGCGCAACTCCGAATTCACCAGTACCACTGGTGTGATAGCACCACTTTACTTCTGACAGCGGGGGGGCGTTGTGCGTGACGCGCCATCACGCGCATTTAATTTCTTTGTCTGTTTGATCCGGTGTGATTTGTCTCCGCGTCATCAGAAGAACGGTGACGGATGTACGCCGTTTGGTCAGCGGGCGACCGTTTATATGAAAGAAGAAGAATGTATGCTTTTCCAGCCGGCGTTCAAACTCCGGTGACGGACACAGAGTAATAACAACACCTAGTAGCGAGTATCATTAAATCTGATAGTCTAATGGTCAAAACATAATGTCATCGCAACGGCATCCCGGAAATGTAGTTGCAGCCTGCGGTGGCGAGCAGGGATCAACCAGGTTTTCGTTGAGGTTCTGGTCAGCGTCAGCGTGATTTGGTGTAGCAGCGGTCACAACCCATAGAGATGCCGCTCAGCTTGCCCCATAAAGTGATCTTTAAGCCCGCACGAGATGTCAGCGGTCGTTGTAAAGATACTCCGGCCCAATAAAACTTTCGACCACGGTGATGACTTGATAAACGGATCGTAATGACGCGCCAGCCCCATAGTTACGCGCTTCATCGGCAAATTGCTGTTCTCCGCAAAAGGTAGAGAAAGTAGTCTTAACCTTAGCAATACGC
+
#$$&'%*4.238/=)04.'/$7&),--35(.4:;2+)&)*(('(&)-32:>A<E++&('('''2:*/5/46@6??.19=2*34@0,/;+/4-&+,/8;-0-%+//2-43()$((+(+'''9/A;;?;=1,3)+/.)0*)0)0)+-)'-0%+4--&339<?A=<:=?<<<5@B7,,+448@:9/1&)).>D5:(97,+.0,..21A*(*)118982?;+&/5,617(5.4.,.+*)),%"$%+70<(%*(5)0'+5<7*5-B:/404(3//--0=66..*-:24&02=49,//03<9;867CDA716:>;@5DE85:&.4?)779;53%&+,/6-*/+*&*&&'''-9*1;3>9;F<8;AA=@:;7:7545==678;/5=>0()&+1((**40<86&66/,-767*+')),:@8>8=625,)+.*+,.-2)--/2+'-0/'*,+.(67;:E-+(--/06BE@:?>>>))(*&&;,1<(*++'&+$*.(''0/7*.'3&+-4>=7=%1+75>E;:22,,1'&+)'(7;+;>//+'&)--92/,93-0,1==65<8/&.,1:8AC=>/%)&00/,+(&$),(&6<7E4(5-)&%)'.'('>B4B7(,.,)22)/58;,0&/+./3&5#$++*'97*(-+(%&).4*0+,4,(++%&(&(.;75-A*.'*%$%)$&)<AC5,,&())176'5<.13+,678-+8/120/42+0*5.0.&%$''&#%%'45'&0++,649/'(31+(/-7(/(*:03428=<;?@<A<*.+5)2.1473==>58:*CD+*73589=5''1+8/3;+*,/,/015:;3*$&%&3C94.,/)420E?141*$),$)'*--,3+'*&&'*.77;CC1(,%%(().689.B:51100>2*,+((,,-/..(**('%84?/),)&'%.*8:D;759=:.01/)-(*:835)77@+57A*-&)$'%%)1:A@?74')/7>;.(++8@:=?/22<=C+-)2&,'%$)<+.(*,341293=>/$),567;<4:=5.6ED.,'**,'*&(.---719,,..*23;/.-0889/65/4CD)/+4-512:=6==<<8B?1636:6@;><89)/''&)$%(+('))*+*02A6%6()&&($6833=14@.64/,+0(%&'+*-;4.,0-+-9CD6*5+,+.2AB.,%298902;0.2-/201/*+.),'9**,25-589-&2/).((,5::1%()).72(*3%')($)&,+//1++-/.*+,302)),(.:;8=B=;89='%,0/33,+'+80,(.-(C?=148=9,9..04:E:252.*'-)/6**.?912::9@6<2('**(+0<<+.0334@).+&,+*:;8:54<<@B??<0-.+5399A?A<<.0)3-A70.02<A;4'88/.5334.2/8@@4?.0)3**'&$$5.<)/)%7'.--*%0,01&%'(+988--41:CEE309<@?.+4:8;<945)3278408<,3*(+$112+0)''?9:<57;7-//3:<BEDC90137*4*%-&,?,9;0(:7):9,0-;;4*)'*,,././@($(()CB7?FC9*9F9::44/,+-.2++*(),/.//,+,/'2&.'*)6./0BA=A;92>D,*<@02')'1)-AF,A-.0A4%.'-.-3>)A08:9=::-=4,-%,)*47A85,+35*3333'7?DEF13.)././36;+4;))&.+,)/()&/.=?=..()'%+303034:B<:4.$*'-.%&()2)9)0+8=2*6):+010/+76..1+-'610++'*%0-/.6,-/(1;5(3.2../,**,'*23*-;4%&+(0+,=979/,..)2*)$++'1-//)'./),*-))&-2/&%)')+1#)+-80B2:?;-+1157;95,&))/347772%%'#)3*0.3)&&%*$*+&')0?-09()-&(=9)<,'',*%*&0$+'0-+,''+'.+'),*---(+%(%%&%-:5/3,$+)0+0-83:50.%(,/01*)('')(,)--104618):/';*)/5/%*.,2A6A9-3)*)'110588%.1*/&*&)#$%$-+..-,.+$('%/*,-4.*/.).&*'/5'8).0@8585%,*9.6.&*&&'452/3361644:AD*&&*&&,%&(.*4-77,/2-''353+1(0&10%'*+:46868)3668>3'(,-=-845<7=EA?D785,*&+0=9@<5@&.->:73*)'')'(((+EAC'0=;;.*-(%90:191/)3,,*++19<C39@36264+-*),2(-,+2+-&)),011-2/5<:DC2DB)$.*28,*4/,76,3@/*$'0)628?..+*.B989@A@>31+.$)>11..*(&-,/2))%',331>@;+*)*./(/23/83//7*++1-02137'(%$)(&*035.&'''&1/.,340/:+1,28/./-+%&0DB9911/'60.2?D@**3>+30.*,()('-+353?/0/.'/,.0)21?211:1E;G57+22))'+/)2--)=<,'0'))B@;78>@?>+343065384.4'&&((&/4;3<22@@>;782802?3>==;8303/*05;@6>B/&(%'((,-7?>A9A@-+3:,&-*(,)+23BA511328BB<GG94.24184/>GB<.2*-/&B>?4CE9*('((*)'')'&')&)%**75.,'*'32>A=;E)/$$//&(&%&'*&*%,)1/'+-&&%')''))((87.D-.1:01589/+/469CDCC=;((0,,*,13<0C?3=(%%%%,-):/(*05+*(*)&.4&06A9;9-1,3..-.00)-%$%$$*,$*2/10')*&-/47;9.-()('-)))*.))+,-65%(,)*%.,333-)&.+6.)/*,*2-%'3')%'(**2',(((&&''**36-789@+.'))).*,/23*-306987(.<584-*#&&%%##&$#%"%$#%"%$#)&)-+-,../-.12223(&#'&&-)(+'%'&#"")$")/4,&%&&&$#%'&)$%$*,%09<@1-3431>;7-4::22>0,*/.2=/012.658734889=3669>AA86,6;7-8,%$),')&(1;2/+*%('&:;??)*)(*)'/5/2'00-%*,/-.)+('/40'('-(%'''%%&,.=52*,*/7,((%$$552;C/*26683((#(%#)028$%38)'(&))?5&.'))*4)(+-(((*5<<D131.157?2-+//2:C625/*/%(%-4B<@7:1B@6>,015>22))*(+.01.14.-%+/*5,)&-)./172675<9@676?A+2&/$$((*@6C-/4;=:*0(+3*+2-('%)+9@;=?7E?-()$&&+-**.6''%$$)0-+*9113493:4/.56*<02+3.1,-=351343+)&)+)1%318,0.)2/10<28@EF/+$$')+6)1*,'(*-./0+...72?CBB<..,),5+'3/2,%&),/15-/)/2.+10<6+'&#''(&)/<1,)*&5:,21'+$.2*7*10+,2.64,'%$'($)'21820()$(4;<?:<=A=C(.-(.4&(+',+0+3.&./..0+2;18/65<A.4012)(&$(*%'%650:1'%,%%*.&/&,&*,$''*6/,**+,-20)52*.3B>1*-.'%:1(23813((3*,+(/)0&+(-,.0)(*)*,'*+',(*+-)6.,0&'(*+3+0,-'*,3./**)*))(///>45/5.&-(%*''.(#''#%&)-490())*3CA560,--)(,%,,03205./5)5--.D.B<=9(-&((23=2752.),/,:<,-003+@+)+@/0,1+3?6465:7BA8<+*+8A5538/506767.&#$).&840,(,(,7;457)-'-,+*./7@77))(.,))-'0-046/009,$%('*&2-3,+031(-()%().0242(*+7@0-''+)8574=0.*)*$'*2:,,0,+(*3++',**&%,+40+-<2420?2+%'/%12+*)$$&*-/*24,*2-.==85&+&**5.-.26463<,>,,*A5>('--.,--41*&(&(((1(''/)&'521>:78)>1-(0*1,+$))').*)$*((*&*'+)&(.*///?59%%**,/==;A:-+'(.(4%61A;/60*-,,4519.79/80-,'''(*+++(&'%%%),9%722+.7.+*%((,*/.'/0,*&'%)&,+04()-')+**)*,'''*-)/&',<83:6@6/142A)*,'&'&&*,987(+&*+*34@?@0/;=37+%-41E-+-,(%<.B==>79.)2?@(/(%**+/77-*'&+%)#%)/82&,.%++)*/>722*0)18)7*'$%,414/-'*,+,*&$('*.26)/'+*+0(0(4/'((%$'*-,/0885<:.2(83?A42'&<23+.*()*)'*&23,)(*%'%).E4>9:*,1;0%%+.16>CCD?A??C=?CE@;0:051*2///8-4457998?>A+29=*/-*2665..()))+-)%(+.0-%)'1<8<51@-12((2-25-11/.+0&'%$*&'$+)&$2)><3>98))%)%))''(&*('(,()'&+&&#%(..%+*642<=23.40&'%'(7/''%&$+/17:6-(+'&#(&%,.2:7.+.-,12,.041,-).(''%'*&)/8/'98774/03<D70A,--9426>(7467E;E3/(&"')7;2;FA::AC*.*-598<<@@4?C1()(((.-)%-*',+))'&#*&)'(*,*,/.2'47*9;>CA6-7-0=84/,+-.6270,#&)&(%))-*'((4--&++$&$.&$$'((,211&6'62+.)1,+1.0B>714-(*'$%/;22346B5'+8;@D'3.*00)??4@/(,+0)),)*''6,(,15A9<<>>CEEE-/(()2:@52;.,%,),72=GF3=B>E1,8A7EA?@CDC(CA;?=*11*2.%/0(,*+*)*(0'+,,87*#&%%.23-03-()(,).-*//1*)*(*+178--.@B5.789;:>=.;1*.?8*.%-,+;>220/%.*-,,&-0&-%60+'12214:.1.9515<45;91+(&&'&50578*2,,%.*++%'-/701',)0;?A,5--*-03.2*13?,3F7+*4*(,*(*''%)1.A4;=)'''*122.455BA945,(0.:9.(2')()%15/)(())%/,:/<457--.0?)()%%
@SRR8494940.2 2/1
TCTATGCTTCGTTCAGTTACGTATTGCTAAGGTTAAGTACTTTCTGCCTTTGCAGGAACAGCACCTGCGATGGCGGAGGCATGTGGTCTTTATAGAAGCCCAGATTAGTTACCCGTAAAGAACTGGAAAAAACTAAATGCACAGAAGTGGAACTGGCCACACACTCACGTGGTGGACTGGGTATAACAATCGATGATTGCTGGAAGGCTGGGCTCCTCCGGCAGAGCAGAAGAAACATTAATGGCACAAATGGGTGATGATCTGGCGGCCACAAGTTCTGGATAAAACCTCCTCCGAGCCAGGGGCGGTTCAAGTTGTTAAAAGAGTATTGCTGGTTTCATGATTTCAGTCAAACCTTATGGACATAGGTGAATACTTTTTTGTCACTTTAGCATCTAAGACATAGAGAATTAGTAAGACAGATAATAATTAACCAGCAAGTAAGCTTAAGACAGAACTCATGGCCTGCGAAGAATCCGCCAGCCAAAGGCCCTGCTCCGGTGGTGATTGGTGGCAGCTGGGTTTGATCGCAGAGAGAGAAGGCGCCACTCTCCGCCCGGGCGAAAACTCCCCACCCGGAACGCAGACTGGCAAACAGTTTTGACGTCTCAATCCCACTCATTGCGTATTGAGGCGATTCAACATCTCAAGCGAAGGGGCTTGTTAGCTAAACCGTCCGCCCGGGGTGGCGGCACTTTTGTCCAGGCGGCCTTGGCAAGCCAGCGATCGCTGGTGGGCTGCTCTCCGACCGTCACAGATCCACCAGCTATGACTTTGCTCGAGAGCACGACGCCACGGAAGGGTATCTCGCCGCTTATTACGCCGCGCTGCGTAGTACCAGTAGAGACAAGGAACGCATCCGTGAACTCCACCACGCCATGCGAGCTGGCGCAGCTTGATCAGCGTCTGGACGCGGAATCAAATTACCGTACTCCAGTATCAGATTTTGCCGTCACCCGGAAAGAAAAAGCACAATTGTGGTTCTGCTTCATCTGCTAAGGTGTAGACCGATGTTGGCTCAGAATGTCCCGCCCAGGCTTGGTGCTCTATTCGCGTCGCGGATGCGCTGTTAGTAGTCACCACCCCACATGAAGCCCAGAAAAATTAGCCAGTAAGCGGAAGAAGCCCGCGCAAGCTATCATCGCCATCTGGCCTTTATCGAAGGGGAAATTTTGCTCGACCGGAAGCCGTGAGAGAGGAGCCGCCGTGAGCCGTTCTCTGCGTCGTCTGGAGCAACGAGAAAAATTAGTGATTTTTCTGGTAAAATTATCAGGAAGATGTTGTAAATCAAGCGCATAAAGCCTTGTAGCAATAGAACCTATCTTATTAGGCTTTCCGGCGAGGAGTTTCAAATGGGACAAGTTCAGAAAAATCAACGTTATTAGATAGATAAGGAATAACCCATGTCAGAACGTTTTCCCCAAATGACGTGGATCCGATCGAAACTCGACTGGCTCCAGGGCGATCGAATCAGTCATCCGTGAGAGAGAAGGTGTTGAGTGCTCGGTATCGATCGACCAGCTGCTTGCTGGAGCCCGCAGGCGGGTGTATGGCTGCGGGCGCACATCAGCAGCTACGTAATACCCATCCCCGTTGAAGGAACAACCGGGTGTCGAGTGACCTCTCCTCTGGAACTGGAACGCCGTATTCATTCCAGCTATCCGCTTCCAGGAACATAATACTGGCAATTGCTGCATGCGTCGAAAAAAAGACCACCTCAGACTGGGCGGCCTTTTTGTGGCGTCGCCTTCCAGTCTTCCGCAGCCATGATGTGTGCGCAACCGCACTTCTTGCGTACCGCGCACCTGTGAGCGGGATGGCGGCGACACGGTTTTACTTCGGCCACATCTCTCCCCGGGCGTGTACGCTCGATGCTGGAAGATCAGTACTTTCTATTCTATAAAGGCAAATGTAGTCATAACCTGCATAATACATGAAA
+
##%$&//1=2-/<3)./-&+,-)-3-00/6-,---,-0'(+*/-,)+$(*)'/$-+'))(219@*(&&(**/3<,6+25++(&'2++)+/&$)))*-*+,+*-,/'%*.*)86+(,4:.*/+32>B0273/*'*'()0'*+)1007255C<?=-(*1+*,*.+/+0'))/1-(,5>=8),&-+.3?88E44($&(&-.)+*57+)),):76,11>*(+&+62)+.,.-+,.&)-)*).2477031#$(*-2-.*%'%*,78<5/70/1;0*)-),<=<<:<CB?C7.84,5++.'*(%0,%61.'()9.D8?9>A<7324:<,')&/%5%%086/3843')('+A56.).-(+)&'')(11;AC621.84./74*.(*3;A31-,$('3+'$*,39,,'(%#-1))-'1/5:.-31'-1/,,6035/0,0(.(*)*'50.+'0+''))&.*-0',*/3032,5'.'&%5&**'+68<?:>D@.;;6=,38;0+-(+/91)(20,'.+/50*(-9812367;45D:139,')%%****+*,+,,0-00322331,)-81(9DA7297)-3?5?1-27/5;(6971''559.9.-/-),:6ED,.+.0>6++'-+470%0/44365281),*('152)'06;0)0021,,+3&((+(()1+075-.&$&)0(&'&'+<3/,13*220%*0,+4:4,1-0/+&371.54.,/..)'%&&+.$%,'*,+*+3++,12=7/38:858872-039E@-/%$#$;4@&%*''&*,)+**-3?<,0<2*+*-(3/1,$(()'%&1<++/7),*2112)+4.044%.((-3560/+-/,6,0*+2/6.)+3*&+)..,*9*<,+357;;:-2.1A@:82(6+'+27<63''),/3266542/.,('),81*(+'558:FB@465<<>?<85+/%$3=695:89,/.<<@@6502,('(./07;,0/,+**/.&.37)'$'#%)%,%((.2<?56,-.14+:<::38/'.*-'(''+**)&0.5//-45;11,3445?3)'))(),+)+(4,3),,0/.5:0))*..2.-0-)++'(/'&*')**&)*2.,&,02..857*,)(06**$0+/.,/,(*4+(%*()(+0*,*(+/.4;:-1$%(&%&',+,'+.*0113:'-('#$'+*22)+*,.-)*))07?E23A?;1:*//+-*+-(.&'&+-+.-)%'(%)-,*)%$'$)4,%,<901+012>B=94/(0003/,)177394-))*,34BEE82./0,26.*-0764()*)(()=(,)$)+---456%'%%#$%+((+&&$$%$%0/=44-8:4=1*2+A+)>+'(-012B1.('&%'++&(2E591<,-8DCB=='9;>?,).398,,/2>3=72.6033<?)/)'/4/'9**6(A41/50:810,==.=DA+>95,((&+*&*,'-6<=4*,''''+<0.,.1.1-4688<'%/&#()567&)*+/.58?>F@4442-')+*',()4+/-+.%'&(2/=*97854('/9=ACC2-2--.=222-*.'*45,*/,(%#)/(('$'&4B?>51,0,-'+('*8<.00/(2-)+%)&)$(9:*,=:@A3,2)))*-+4:52'390.',%23(#*(%'(++.+/'*/,)4,,11228>=.48001=B=.2)+&)*8:;4>6*/21,(%*'.??-./('$'$,,96(>($(20/&.,2,49/18?EE9/-20+*1726,,3-,(./*+,-.21.-&$**50@98),((-02-19<6<<0,+'%'%%*231--('('(42+6+%)+,-4(&&%*(0&+-(*(,%*&$%&%$>?&AA;C=?4373510)1-81CB94:3&,'(,-,-,4((((/AB4*>AA:9'1293.67863*23:-A-0,0%'%+$(-',&)'203/)%/:0+)3/+,('%''(-'*'&&%,).*(/24781'%)%&

As i am using docker i tried to find a docker image of minimap1 but couldn't find one which is why i will need to create one myself. For now i also tried using bwa mem instead and got the same issue with an empty polished output file. Thanl you again for your help! BR

rvaser commented 1 year ago

I am able to get it running with the following commands:

# Overlap phase
minimap -Sw5 -m0 -L100 SRR8494940_1.fastq SRR8494940_1.fastq > ovl.paf

# Layout phase
miniasm -f SRR8494940_1.fastq ovl.paf > layout.gfa
awk '/^S/{print ">"$2"\n"$3}' layout.gfa > layout.fa

# Consensus phase
minimap layout.fa SRR8494940_1.fastq > iter1.paf
racon SRR8494940_1.fastq iter1.paf layout.fa > iter1.fa

minimap iter1.fa SRR8494940_1.fastq > iter2.paf
racon SRR8494940_1.fastq iter2.paf iter1.fa > iter2.fa

You can also use assembler Raven, which runs 2x Racon for polishing, as below:

raven SRR8494940_1.fastq > asm.fa
an22an commented 1 year ago

iter2

Thank you! I will try this out now, so does using either sam or paf files make a difference here? Also you used minimap1 and not minimap2 right? May i also know why you choose two runs in a row on purpose? Thank you for your help. BR

an22an commented 1 year ago

Hello, so i tried the exact same commands as you out and still get this error when running racon: root@dbce972afaeb:/data/EscherichiaColi/Racon/EC_ONT_Racon# racon SRR8494940_1.fastq iter1.paf layout.fa > iter1.fa [racon::Polisher::initialize] loaded target sequences 0.096035 s terminate called after throwing an instance of 'std::invalid_argument' what(): [bioparser::FastqParser] error: invalid file format Aborted ############### Also i have an assembly i actually want to polish which is why i tried following command: minimap -Sw5 -m0 -L100 Ecoli_R.fna SRR8494940_1.fastq > ovl_ec.paf [M::mm_idx_gen::0.4610.86] collected minimizers [M::mm_idx_gen::0.6391.42] sorted minimizers [M::main::0.639*1.42] loaded/built the index for 1 target sequence(s) [M::main] max occurrences of a minimizer to consider: 8 [M::main] Version: 0.2-r123 [M::main] CMD: minimap -Sw5 -m0 -L100 Ecoli_R.fna SRR8494940_1.fastq [M::main] Real time: 10.077 sec; CPU: 20.485 sec

I get a .paf file but then the layout. gfa file after running miniasm ' minimasm -f SRR8494940_1.fastq ovl.sam > layout.gfa' is empty. miniasm -f SRR8494940_1.fastq ovl_ec.paf > layout.gfa [M::main] ===> Step 1: reading read mappings <=== [M::ma_hit_read::0.0460.63] read 31620 hits; stored 33082 hits and 12286 sequences (164738066 bp) [M::main] ===> Step 2: 1-pass (crude) read selection <=== [M::ma_hit_sub::0.0520.68] 597 query sequences remain after sub [M::ma_hit_cut::0.0520.68] 6630 hits remain after cut [M::ma_hit_flt::0.0520.68] 6114 hits remain after filtering; crude coverage after filtering: 3.36 [M::main] ===> Step 3: 2-pass (fine) read selection <=== [M::ma_hit_sub::0.0520.68] 581 query sequences remain after sub [M::ma_hit_cut::0.0530.68] 184 hits remain after cut [M::ma_hit_contained::0.053*0.69] 5 sequences and 8 hits remain after containment removal [M::main] ===> Step 4: graph cleaning <=== [M::ma_sg_gen] read 8 arcs [M::main] ===> Step 4.1: transitive reduction <=== [M::asg_arc_del_trans] transitively reduced 0 arcs [M::main] ===> Step 4.2: initial tip cutting and bubble popping <=== [M::asg_cut_tip] cut 3 tips [M::asg_arc_del_multi] removed 0 multi-arcs [M::asg_arc_del_asymm] removed 0 asymmetric arcs [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.3: cutting short overlaps (3 rounds in total) <=== [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 4.4: removing short internal sequences and bi-loops <=== [M::asg_cut_internal] cut 0 internal sequences [M::asg_cut_biloop] cut 0 small bi-loops [M::asg_cut_tip] cut 0 tips [M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips [M::main] ===> Step 4.5: aggressively cutting short overlaps <=== [M::asg_arc_del_short] removed 0 short overlaps [M::main] ===> Step 5: generating unitigs <=== [M::main] Version: 0.3-r179 [M::main] CMD: miniasm -f SRR8494940_1.fastq ovl_ec.paf [M::main] Real time: 5.453 sec; CPU: 1.474 sec Do you have an idea why the .gfa file is empty?