isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
257 stars 48 forks source link

ERROR: Reads are not specified in a format #26

Closed blsfoxfox closed 6 years ago

blsfoxfox commented 7 years ago

Hi,

I am using racon, and get following error: [01:45:17 main] Using PAF for input alignments. (/gpfs0/home/lzb0021/ouc/pacbio/ajap.paf) [01:45:17 main] Loading reads. ERROR: Reads are not specified in a format which contains quality information. Exiting.

The pacbio reads are indeed in fastq format, and the quality value is like this:

+ -+/,"..".-'--$&$$-./.(.-),..--,//.#$',,.)////--//+'&$-.+()%#)/(,++)&-++%&+(+(.).-)...-#.)/,&+.../.,.($#&&&#$#$)&&+%-%+& '''-&'##$&(&&'&%+&(%/(),$.&+.++#+,(..((&)%--)(&&)-+##'#.-"..$,,,++(..$')()$%#&)+&+&##$#)&)'($##$($#$'$'((+&'%$'',, ,+,)-,#()&,".).-&(($(%#'$&&%&&%#&-"+-%(-(-)+,+(--+%(.,)%,%#(+'+#($(*%$'-.%,))&&)&,(%'*&&')+#&,)..&,'&),,%&++(+#)%&+) ..%-/'#(,,+#$###"$%))#',.($%##%%(,'(#),%&%$)%+++&,+**+(+%(,$$#$$'(%'##+,+$*)-$"(.$.+..#(--,#&,,).,&(,++#(''(+. &,%(.#.%////+-#-('&,$%$#$$$$$####%(#+,-&'"'#'+()$&.-.-,$'('-%$$'-$%&.-+().-#++)&&)(%')$&'&&$&)$',%,&-),&&),''(,& &$+++)./&.-%-$'/.)$#&#"###$$#$#%(.),$$,-.&+.)&#/'-(//()+%.#%-()...++$((#$$-)&)(.-&))(-#--&$&$#$+)#-&&((('-)),(,() &'-&-,-"####%'+'&"$'-(%)+(-)*'&'-%$&-&&&&&&#%'.)$)#%--$'''$'&('&&+-)%%&&&',-(()&&%)(..,-/-,+(-&&&'&"%&)'.-.,+(&,,).,*

$$#########""#"&+,+$"''-)..,.-$'"$#&#+(-.&+(+"+&),&+++&)+'),+(/&',%+'%*(&',))&%&#$&"$$$#$%%$()+((%&%(&$,'('%%$#%$'&&')

)%+&),,,+)/./)+#$$##$$##$(%#"%%&,-'%&)(%%-)-'.).#'%)&+'$%&&$(-&%&###'&()+'$.+.-&'',+-+()++#))',(#)&%&,.,/.(,%$-)#+% +.$(-&,%"%"#$',,+%')'.-,'-'#$"$$$$$%)-$)(,+'%#&-$#%#"$'&.))../+)))&(()))*)#(-),-.&$+++)&+'%&)..+--%--,')/+'-','+("$$ $#"$$$######%&.#&.+.&#$'-#.,),,).,.+%+'$(.%//..&%&///..-,/.-,)+,'&(+'%'&'&&#'&)%%$(++)&//-/'..(+/+.)&."//+.../% )%&,%&%&(-().))#&%$$$%"+--)./(+-')//-,,+(.,)*,,-**&,(.-&.+$&+%+$%%$$#####"&-'--%&#'#$#&+($&$(&'('%%(,-%'(&&+$%$(-#%#+#, %)),+%#%'%-''+-++-+#,+)(,$%-%-".(--$$%$"#""####$#&(%&-)+'$%",'-,'('%#%$%(--''.-#+'.)'')"''..-+&%$'(-.+#(..'),%+-.(-()+/ //"+/..,))*''##&#-&,%%#%"##'+#%'-,+-#%..())(-$#')&$$',#$$$,&'%#'(,(-%&%#%%&'#,,,,)./,,+'.#)..+,(#'##$$%&&+&('.)) &.+/%..(&-(%$#$%'-('-,'&+-+-+-.(.$&'#&,'#&#'-+..$,#.&).($(..',,+(#'$%$%#$#%%''((./)(-///.'//,)).().)),##%#&##+$#('% &($%&,&,-.&#))$%)&&#$#')'#'%+%-&++'&&&'('++)(+-)%$'$)',.&,''.-$&%($%$##%""$#-#'(%&%(-))($%##-+'"#'-.))'$-','($-'..,). ++#'+.&',,,)++++++++++,++)...++$(()(%+%$'+##$(-*.',%./++'(-#%$#$##%%)/'$/--'&%%$%&'#((&-.('**#.,%-/,,+#%)&( &())'+&$%&%)-/-+'((*&)'%+,),$#./..+,++..#&%#--(+/+'-+(").+//.+.)'--,',,-++++%$')&,&$%&%)#(($((('',&&()%$),.&...++ +-,,,++(',###(./',(')'(%%%%%($%$$$#########$%#'$'*)%)-+(#.--"&.-+&+**)(%')&())))%')))(()()(#)'&#&).(.",+($--&$#&%)#.+&)/" ,+,.'.+-++,('(/#)$.-%'('$$$#$&$&%##$%$"###%($...))%%-%//..+/,-,,&$-+$"#(#&(('-&%).,%&-",%%%#$$(.(/,".+-,,.(/%/,(.+(%,'# %#),)$%'%')%%#%&%''%)&,/.,+..##&&)))(+"$+)''$%)&'&'').$.$..#&-,,',,)/+,&(&-'$$&)--,)"&().///%+-'&((%'()&&)('(((+-).$ +&,).#(/+)+&+'$"#$&,+()(&-#$&#,%&+#&'#%##&,-((#''($-$(()$*)(.(".%&%$#$%+$&'%")'-,-%-'&&)##'%')&--)'+%-.'#)/&+,,+ +&)+'(+.'+).,-""&#%$(-*.,-.%)$'$'&&%%)%('-$'$$(##''

Could you please provide any suggestions here?

Thanks!

rvaser commented 7 years ago

Hello, could you please provide the full command which caused this error?

Best regards, Robert

blsfoxfox commented 7 years ago

Hi Robert,

Thanks for your response! After grep multiple lines after that read, I found there are fasta sequences in that fastq file. I believe that is the issue.

Best,

blsfoxfox

elenasantidrian commented 7 years ago

I am experiencing the same problem. I first run Graphmap: /graphmap/bin/Linux-x64/graphmap align -r scaffolds.fasta -d long_reads.fastq -o long_reads_scaffolds.sam And then I run Racon: racon/bin/racon --sam long_reads.fastq long_reads_scaffolds.sam scaffolds.fasta nanopore_racon.fasta

And it gives me this error: ERROR: Reads are not specified in a format which contains quality information. Exiting

However, I have checked my fastq file with the long reads and indeed, is in fastq format, any suggestion? Thanks!

rvaser commented 7 years ago

Hello Elena, I executed your commands on a small dataset containing lambda reads in FASTQ format and everything worked fine. Are you sure that your long_reads.fastq file is in a valid FASTQ format? If you are allowed to, please send me the output of the following commands: "head -n 16 long_reads.fastq" and "tail -n 16 long_reads.fastq".

Sorry for the delayed answer and best regards, Robert

elenasantidrian commented 7 years ago

Hello Robert, Here I send you the the head and tail of my long reads. It looks fine I think, there are as many quality characters as nucleotides in the sequences in all the cases. So maybe the problem would be that there are some sequences in the middle without quality values? It his would be the case, I dont know why did it happen... output_head_tail_long_reads_fastq.txt

rvaser commented 7 years ago

Hello Elena, the sample file you provided looks fine. How did you obtain the whole long_reads.fastq file?

Best regards, Robert

elenasantidrian commented 7 years ago

Hello Robert, My long reads fastq file is the result of 5 MinION flowcells. Every run output 1 fastq file, and I concatenated the 5 fastq files to obtain the long_reads.fastq file. I also think the file looks OK, so I guess there are some "intruder" sequences in my file without quality values or with uncorred number of quality values...its the obly explanation. Do you know how can I chekc this? I read somewhere about fastQValidator.. Best regards,

Elena

rvaser commented 7 years ago

Hello Elena, I wrote you a simple python script for FASTQ validity (https://pastebin.com/6anD3d2q). Copy the code in a file and run it with 'python \<file> long_reads.fastq'.

Best regards, Robert

elenasantidrian commented 7 years ago

Thanks! I will let you know that if that was the issue and if it works :) Thanks again!

Elena

elenasantidrian commented 7 years ago

Hello Robert, I run it with long_reads.fastq, it runs, it finishes and it does not say anything. I tried at well with the head and tail I gave you, but with one quality line trimmed on purpose, and it points it out. So, I assume the fastq file is ok... Does the sam file need to have quality values as well or is enough with the read file having them? Anyway, when I used Graphmap I mapped the scaffolds against that same long_reads.fastq file, so I assume it should have them...

rvaser commented 7 years ago

Hello Elena, I am not sure what the problem is. Have you the latest graphmap version? Did graphmap finish without errors? Please paste here the output of the command 'head -n 15 long_reads_scaffolds.sam' as well as 'tail -n 15 long_reads_scaffolds.sam'.

Best regards, Robert

elenasantidrian commented 7 years ago

Hello Robert, No, graphmap did not give any error, I think I used the last version but I have just installed again and I am repeating the alignment. It should not be something related to the memory, right? The long_reads.fastq is 14.64 Gb, long_reads_scaffolds.sam is 19.04Gb and the scaffolds.fasta is 1.646 Gb. I have just checked the head and tail of the sam file, they look different, is this normal?

/ rvaser removed the snip /

Thanks!!! Elena

rvaser commented 7 years ago

Hello Elena, the sam file looks okay to me. If you fail to run racon with the repeated alignments, please try minimap instead of graphmap.

Best regards, Robert

rvaser commented 7 years ago

Hello Elena, were you able to run racon one way or another?

Best regards, Robert

elenasantidrian commented 6 years ago

Hi Robert, Sorry for my super late reply, finally it did work, I reinstalled Racon again and it run without problems. Thanks!