isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
261 stars 48 forks source link

Racon output is empty after polishing #143

Closed dcm9123 closed 4 years ago

dcm9123 commented 4 years ago

Hi! It's me again! I am having a new problem and I am not sure what could be happening here... So, according to issue #140 I performed the following commands to get my fasta nanopore reads corrected:

Minimap2 command: minimap2 -ax ava-ont --dual=yes barcode84.fastq barcode84.fastq > ava84.sam

Racon command: racon -u -f -w 50 -q 9 -t 20 barcode84.fastq ava84.sam barcode84.fastq > barcode84_polished.fasta

However after it's done running I am getting an empty file as a result. This happens across three of my 96 samples. The head ant tail of my minimap2 output of my sample 84 looks like this:

minimap2 head -n 10:

@SQ SN:6595baca-84bc-486b-81cf-721bfdb712a7 LN:412
@SQ SN:fb246a2b-7761-4868-b4f4-892ad36c1285 LN:414
@SQ SN:df50ce03-e54e-45d6-82bc-e6c4771b484a LN:416
@SQ SN:87f34c90-58aa-4383-a20d-82aac14e374d LN:372
@SQ SN:dd7d588e-1dcd-4f79-950f-a16d5af3e6c0 LN:385
@SQ SN:84cb5126-28bf-4405-bb6c-81ae43198b78 LN:419
@SQ SN:63083b19-1177-4be6-976d-8918094ae663 LN:422
@SQ SN:5a374858-0968-40a1-bf1b-1f3c5f80ecb7 LN:406
@SQ SN:b6c1f089-c90d-4922-a97d-80c31a5c3cf0 LN:421
@SQ SN:df37edda-25fd-4be5-9937-43fb5f35c9a5 LN:402

minimap2 tail -n 10:

2c955d68-7906-4227-90fd-1e6b2be41450    256 fdcd89cf-c770-4adf-a107-edd317e5431b    2   0   6M1I15M4D8M1D8M1I5M2D25M1D87M1D4M2D8M1I8M1I2M2D36M2D8M2I30M171S *   0   0   *   *   NM:i:22 ms:i:400    AS:i:400    nn:i:0  tp:A:S  cm:i:19 s1:i:108    de:f:0.0532 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 95354c62-c54e-4a6f-a16b-30f2cf5a623d    2   0   29M1D1M1D1M4D7M1I30M1D36M1D8M1D11M1I43M1I22M1D3M1D19M1D11M2I40M6I17M138S    *   0   0   *   *   NM:i:32 ms:i:400    AS:i:400    nn:i:0  tp:A:S  cm:i:23 s1:i:121    de:f:0.0788 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 61bb5016-ec1a-4b1c-b476-ca745427ed47    1   0   6M1I9M1D6M2D16M1I6M5D18M3I68M2I31M3I7M1D17M1D11M1I18M2I17M3I20M5I19M2I14M121S   *   0   0   *   *   NM:i:40 ms:i:398    AS:i:398    nn:i:0  tp:A:S  cm:i:17 s1:i:104    de:f:0.0738 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    272 0c409388-3722-4b87-aa7e-9b8d0167048b    1   0   188S8M2I13M2D35M1D1M1D120M1D14M2D21M4D16M1I8M   *   0   0   *   *   NM:i:17 ms:i:394    AS:i:394    nn:i:0  tp:A:S  cm:i:20 s1:i:100    de:f:0.0451 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 fb1175cf-80f9-40ab-83fb-79ace8469c4c    1   0   8M2I12M4D16M1I94M2I33M1I8M1I13M1D3M1D19M1D11M2I14M187S  *   0   0   *   *   NM:i:16 ms:i:390    AS:i:390    nn:i:0  tp:A:S  cm:i:20 s1:i:100    de:f:0.0415 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 37ac12b0-a16a-4556-a71d-0f7de915290d    2   0   6M1I15M4D16M1I80M1I51M2D6M1I2M2D14M1D19M1D11M2I10M191S  *   0   0   *   *   NM:i:16 ms:i:388    AS:i:388    nn:i:0  tp:A:S  cm:i:23 s1:i:113    de:f:0.0417 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 700f1088-d181-4d96-9a0a-42e6b495e1be    1   0   6M1I32M5D4M2I35M1I27M1D2M1D49M1I8M1I44M1D11M2I19M182S   *   0   0   *   *   NM:i:21 ms:i:372    AS:i:372    nn:i:0  tp:A:S  cm:i:17 s1:i:115    de:f:0.0607 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 f4cc6d90-b912-4eca-b007-a3ea214535e0    25  0   22S16M1I60M1I2M1I13M4D20M3I3M2I17M1I6M3I4M1D5M1D35M3I2M1I3M2I28M1I2M1I7M6I56M2I15M1I22M1D7M1D10M1I5M1I34M2S *   0   0   *   *   NM:i:77 ms:i:350    AS:i:350    nn:i:0  tp:A:S  cm:i:15 s1:i:119    de:f:0.1523 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 d2c6cfba-6c9f-4bd7-bdc8-1ac59eba13ed    465 0   6M1I15M4D10M2I4M1I5M1I5M1D6M6D14M2I34M2I5M1D2M1D35M1I3M1D2M1D28M1I7M1D1M2I29M4I34M5I1M3I5M2I4M1I3M1I5M2D6M2I13M1D14M2I4M2I4M1I27M1D17M2I4M1I32M4S   *   0   0   *   NM:i:88 ms:i:348    AS:i:348    nn:i:0  tp:A:S  cm:i:14 s1:i:106    de:f:0.1466 rl:i:261
2c955d68-7906-4227-90fd-1e6b2be41450    256 a82f4f44-a4ea-4229-af0a-0e7dc246235f    6   0   7S15M2D15M3I5M3D26M1I9M1D5M1D21M1D16M1D14M2D40M1D3M2D5M2D20M1I7M1D11M2I12M189S  *   0   0   *   *   NM:i:29 ms:i:310    AS:i:310    nn:i:0  tp:A:cm:i:13    s1:i:102    de:f:0.0837 rl:i:261

The head of the barcode84.fastq file looks like this:

@6595baca-84bc-486b-81cf-721bfdb712a7 runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=11 ch=72 start_time=2019-08-01T00:25:13Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2
TACTATCCAAGATCGTTAATATCGAATATTTTTCTTTTTAGAATGCATTGCTTTATAAACAAAGAGGTTAGACAATAACATCTAAATCGTTTCACAAGATATTTCATCCCATTCCATCACATATATTTCTATATTTCTGATCGTTTGTTGTTGAATTCTCAGTAATATTATATGTATTATCCTCGCTTTTCACATATAATATCATTAAAACATATTTTACCAATATTTCCAGGAAAATCCTTTGGTTCAACTCTTTTAAATAGGATCTGATACCTATAGCTTCCCATATTTTCATATTTAATATTTTCGTTTATAATACCTGTTATTAATAACGATACCCTCCCTTGTTCATTTCTTAATCTATTTAAAAATGTATTATTTTCTCGTTTCATTATAACAATCATATATAATC
+
F<&+=%%DOI:>;B?QKRF?*$>&&-AC9BFI?@JJOHD7*6((#$$$(,+//1<ENB:4F@,0%'/33''2.>>DN?=4.A8E<94@5DDE=&%;;.5<+3F7B9%%33108@?715AB@D<89:><<7954566F886;=5BBC;GGAD;GGF<<7.<&*3;40112=;><==><>A**2,2&&;:==40/5?DIN@:388416?>F>?BJNFGF<4CDO?DLGJD2B89&',B>9:><<I:>B;7/69ADLPMFD;@9::?73=.8)%6?@@400;9999?B6332=><G>@B<8KWIEECHJ=51@@6<FK4//ABHBEJ@?>@329(##%%%>><5$-$$42,3=//IJIPA<0:BGGCIOQTDC;**554;66DJ>-9,-4<;9*),('%$%0(&()2<<CIWG<=
@fb246a2b-7761-4868-b4f4-892ad36c1285 runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=17 ch=51 start_time=2019-08-01T00:25:18Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2
CTATCAAGATCGTTAATATCGAATATTTTTCTTTTTAGAATACGTGCTTTATAAACAAAAGAGTTGAACAATAACATCTAAATCGTTTCCAAGATGTTCATCCATTCCATCACATATATTTCTATATTTCTGATCGTTTGTTGTTGAATTCTCCAGTAATATTATATGTATTATCCTCCCTTTTTCACATATAATATCATTAAACATATTTTACCAATATTTCCAGGGAAAATCCTTTGGTTCAACTCTTTTAAATAGGATCTGATACCTATAGCTTCCCTTTTCATATTTAATATTTTCGTTTATAATACCTGTTATTAATAACGATGCCCTCACTTGTTCATTTCTTGAAATCTATTTAAAAATGTGGTATTTTCTCGTTTCATTATAACTTAATCCATATATAATCCATAA
+
)1,&BD?6=891=>K<A,):25;AF:9DG63'FKID=<@411$%A)0BCP@GJC41KK8'+,2E6309<J61KBE=382>>7.+-246#(754?<@E;?:;?79AC?>51?86=DCDFGHCC6546'-188?;;>4DDD&&*4;<5687=FF&?=7<GEEH67=<>:8;=6:A2298/=/'7:>3,&')-6;G<@;88144%),47;=;G90.68J?A@@CA$=+560.3=8--/.=B5=J5?0,-7<<HHFIAI@@:D====E41287-)&&'$(-#(')...67>B@CF=800/BD=A5BAD@2Q>76>>EC=HD=AC=:<.*-.12;;:+4$"%1<C>@?;E?996%,07::B><ALUOE@?-,&%$;==?E=9;8>A?8DA@GCBB4%%&7478<>@:+A=UE>BFEJWN
@df50ce03-e54e-45d6-82bc-e6c4771b484a runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=17 ch=419 start_time=2019-08-01T00:25:17Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2
ACTATTAAGATCGTTAATATCGAATATTTTTTCTTTTGAAATACGTGCTTTATAAACAAAGAGAGTTGAATTAATAACATCTAAATCGTTTTCTAAGATGTTCATCCATTCCATCACATATATTTCTATATTTCTGATCGTTTTATTGAATTCTCAGTAATATTATATGTATTATCCTCCTTTTCACATATAATATCATTAAAACATATTTTACCAATATTTCCAGGAAAATCCTTTGGTTCAACTCTTTTTAAATAGGATCTGATACCTATAGCTATATTTTCATATTTAATATTTTCGTTTATAATACCTGTTATTAATAACGATGCCCTCACTTGTTCATTTCTTAATCTATTTAAAAATGTATTATTTTCTCGTTCATTATAACAATCCATATATAATCCATAAATTTTAAT

The tail looks like this:

+
?C:6GKD<86:&3,)6F3>=;B$$9<=>@JI>@HGI=B9C@;A13B6;<@BA?IHF>LC-+/7GDHGBGXEHQB<2,A-@<6;=(;85=6:<&/:>BI6F@8:@?<>:7:;A76622***785A=::9::6188;E49KOFMM9DAECC<C:6?;?CO<=>?C<?>,-,-,3,((-,5?;BG2->@A<LOWGG>>A87DPKG0017878:000-)//03=$>,2+*2?:./827A9<I5FB;<=:>HNC@@2//)3;-/<?<?7>>6-13@DH@:76+*'789:@<;=6AIUIKLKNIA<5=?KEITCD>99::03++,-,/9::8<33?A92>(#%&4;;AB7GBGNDK?;ACC@GRXQMGC>49:8B34CF:?,64::6B:;=>>R?AME..E9A@*FJTE892=++17@EA?EC:FB@CC=
@787dcd9d-15ea-45cb-bbb3-18e12594d10e runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=3140 ch=394 start_time=2019-08-01T00:55:56Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2
CTATCAAGATCGTTAATATCGAATATTTTTTCTTTTAGAATACGTGGCTATAAACAAAGAGAGTTGAACAATAACATCTAAATCGTTTCTAGATGTTCATCCATTCCATCACATATATTTCTATATTTCTGAGTCGTTTGTTGTTGAATTCTCAGTAATATTATATGTATTATCCTCCTTTTCACATATAATATCATTAAAACATATTTTCCAATATTTCCAGGGAAAATCCTTTGGTTCAACTCTTT
+
1'42GQF;99@6NAOA>:7C227<<<<?@@5<FJIC9783,,##3%&'65;=@@LNK94%)%*0+0/1<OD?OCC;3@489=3-0AA@=,(':7;A;6?33CFCA:>=9>>>:644556>FHCDJKM=7(*)%/((.063,71<@HIH?B:0A.2:M700/11+68@BAFF>67=>:>8>D@;9%.ECIO3,5:CBAGEBC512;9084/'-@6448;D(@361&'9?=++.,/4+/A-CC><?:EDF
@2c955d68-7906-4227-90fd-1e6b2be41450 runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=3183 ch=366 start_time=2019-08-01T00:55:54Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2
AAGTCATTTAAAATTTATGGATTATGGATTGTTATAATGGAAACGAAAATAATACATTTTAAATAGATTAAGAAATGAACAAGTGAGGGCATCGTTATTAATAACAGGTATTATAAACGAAAATATTAAATATGAAAATATGGAAGCTATAGGTATCAGATCCTATTTAAAAGAGTTAGACAAAGGATTTTCCTGAAATATTGGTAAAATATGTTTAATGATATTATATATGTGAAAAGGAGGATAATACATATAATATTACTGAGAATTTAATTCAACAACAAACGATCAGAAATATAGAAATATGTATGATGGAATGGATGAACATCTCTTAGAAACGATTTAGATATTATTGTTCAACTCTTTGTTATAAATACGTATTCTAAAAAGAAAAAAATATTCGATATTAACGATCTTGATAGTAACT
+
3/75A7660;8AC2A020:75%%#%%51<?1CDHEJ...'/1.++,(LF>L?<9;?9@JGDCFDED?5A8*-95D05BBDA.(,(&*,69:;;5CA:-.C?8=+0111@D?BDGID669=JLA;<69?=-**#'6?@=869>?8;*&(22320<87>76.)7EEC@B@D>7(,05;+)).;@<(*0,BFD943(%'2=;<97()7GGE?5><@OKGHFCHB>?>=84,&'&.53EC<*.*+6=EMTFE6>=?==@:989;8<@;6(.,'-++$-03?AGJDERF>AC($$)(35*$'+827:=;5>:326@@ACJCECJ9=%%%/==::0*?85**/6675=?K?=+%%%,$55:F@EFB>@<DB8--,.,21(%235IJIBC;IH=(%4;5HHE@<8888037FCF@N?/3<;:5=D;77,22*##

The racon output message looks like this:

[racon::Polisher::initialize] loaded target sequences 0.543878 s
[racon::Polisher::initialize] loaded sequences 0.546433 s
[racon::Polisher::initialize] loaded overlaps 948.163245 s
[racon::Polisher::initialize] aligning overlaps [=>                  ] 297.733730 s^M[racon::Polisher::initialize] aligning overlaps [==>                 ] 298.759600 s^M[racon::Polisher::initialize] aligning overlaps [===>                ] 343.238948 s^M[racon::Polisher::initialize] aligning overlaps [====>               ] 418.431091 s^M[racon::Polisher::initialize] aligning overlaps [=====>              ] 492.589928 s^M[racon::Polisher::initialize] aligning overlaps [======>             ] 568.323078 s^M[racon::Polisher::initialize] aligning overlaps [=======>            ] 642.197297 s^M[racon::Polisher::initialize] aligning overlaps [========>           ] 714.705943 s^M[racon::Polisher::initialize] aligning overlaps [=========>          ] 789.758194 s^M[racon::Polisher::initialize] aligning overlaps [==========>         ] 867.548292 s^M[racon::Polisher::initialize] aligning overlaps [===========>        ] 943.996312 s^M[racon::Polisher::initialize] aligning overlaps [============>       ] 1019.564962 s^M[racon::Polisher::initialize] aligning overlaps [=============>      ] 1098.164147 s^M[racon::Polisher::initialize] aligning overlaps [==============>     ] 1173.451047 s^M[racon::Polisher::initialize] aligning overlaps [===============>    ] 1247.213945 s^M[racon::Polisher::initialize] aligning overlaps [================>   ] 1347.385537 s^M[racon::Polisher::initialize] aligning overlaps [=================>  ] 1520.181542 s^M[racon::Polisher::initialize] aligning overlaps [==================> ] 1718.189453 s^M/home/dcmogollon/.lsbatch/1570856003.605247.shell: line 13: 158720 Killed                  racon -u -f -w 50 -q 9 ../trim50_dem_q9/barcode84.fastq ava84.sam ../trim50_dem_q9/barcode84.fastq > ../racon_output/barcode84_polished.fasta

I noticed the error at line 13 from the previous output, however the line 13 on the ava84.sam file looks like this: @SQ SN:6e8c4f51-e8ad-4059-b40b-4e273c696dad LN:416

and the barcode84.fastq file looks like this: @87f34c90-58aa-4383-a20d-82aac14e374d runid=7c6c69b0c1ae60abfe91b0d0c44ea2031ed4d1a4 read=41 ch=383 start_time=2019-08-01T00:25:39Z flow_cell_id=FAL13168 protocol_group_id=cpmp_and_msp2 sample_id=cpmp_and_msp2

Do you know what could be the problem in this case? I am getting similar output across other 2 samples, the 93 remaining are just fine.

Thanks again!

rvaser commented 4 years ago

Hi again, the racon log message says the process was killed, which usually happens when you run out of memory. Can you tell me how much RAM you have and how large are the input files for the 3 datasets that did not finish?

Best regards, Robert

dcm9123 commented 4 years ago

Sure, no problem. My barcode84.fastq file is 58 MB, and its ava84.sam file is 80 GB. The barcode86.fastq file is 31 MB, and its ava86.sam is about 89 GB. The last one is about 85 GB in the sam file and 40 MB for the fastq. The three sam files are definitely the largest of all of my samples, could that be it? Also, I am running it in a server, where I request the following:

#BSUB -n 20
#BSUB -R "span[hosts=1]"
#BSUB -R "rusage[mem=4500]"
#BSUB -M 5000

Should I increase the ram memory and try it out?

Thanks!

rvaser commented 4 years ago

Yeah, try allocating at least 100GB as all overlaps from the SAM file are stored into memory.

dcm9123 commented 4 years ago

Yes, that was it! Thanks again!