isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
269 stars 49 forks source link

Racon gives empty output (fasta file) #241

Closed RacheliHadjez closed 3 months ago

RacheliHadjez commented 3 months ago

Hi, I am very new to genome assembly, I have an assembly I created with NextDenovo with PacBio data. I used minimap2 (the command I used: minimap2 -ax map-pb -t 20 --cs=long /dorotheeh/hadjez/nd24.asm.fasta /dorotheeh/hadjez/FilteredResults_Q20.fastq > pb_alignment_minimap2.sam) Now I am trying to run Racon but the output file I get is empty, this is my command: racon -m 8 -x -6 -g -8 -w 500 -t 14 -q 20 -u /dorotheeh/hadjez/FilteredResults_Q20.fastq /dorotheeh/hadjez/pb_alignment_minimap2.sam /dorotheeh/hadjez/nd24.asm.fasta > nd_Racon.fasta

Can you help me understand what I'm doing wrong please? Thank you in advance, Rachel

isovic commented 3 months ago

Hi Rachel, Your command lines look correct. Unfortunately, I cannot reproduce your issue using your command lines on small sample data (available in the repo). Does this produce output for you as well?

minimap2 -ax map-pb -t 20 --cs=long test/data/sample_layout.fasta.gz test/data/sample_reads.fastq.gz > aln.sam
racon -m 8 -x -6 -g -8 -w 500 -t 14 -q 20 -u test/data/sample_reads.fastq.gz aln.sam test/data/sample_layout.fasta.gz > out.fasta

A couple of questions:

  1. Does Racon finish successfully the entire process?
  2. Does the SAM file contain alignments that look reasonable?
  3. What happens if you remove the -q 20 option from Racon?
  4. What is the size of your input reads file, the input SAM file and the input reference file? (It is possible that, if they are very large, that they cannot fit into memory and the process gets killed.)

Best regards, Ivan.

RacheliHadjez commented 3 months ago

Hi Ivan! First of all thank you for your reply! I ran the command you wrote but still it gave the same results.

  1. Racon ran for about 2 days and generated an empty file so I "killed" the job. After it stopped it generated an error file (size 136360KB) that I couldn't open.
  2. The SAM file looks like this: @SQ SN:ctg000900 LN:437423 @SQ SN:ctg000910 LN:102996 @SQ SN:ctg000920 LN:1577448 @PG ID:minimap2 PN:minimap2 VN:2.17-r941 CL:minimap2 -ax map-pb -t 20 --cs=long /dorotheeh/hadjez/nd24.asm.fasta /dorotheeh/hadjez/FilteredResults_Q20.fastq m64047_230223_061922/29/ccs 16 ctg000300 4682658 60 1288M1D3493M1I3M1I2692M1D877M1I322M1D9337M * 0 0 GTTACGTTAATTTACAGCTTTTAAAATTCCTTTTTATATCTATTCATCAGAAATAAAACAATTTAATTTAAATATATCAACAACATATGAATCAATGTGCATCCAAAATGTACGGGATTAGTTGGAATTTTTATAATGATTTTAATTTATGATCTTTCAATAATTTTTGATTTAATTAATAAATGTAATAAATAAAAGAGTTTTGGTCAAATATA.....

I copied a part of it so you can see, it looks different than SAM file I generated with BWA (short reads) for example. Does the format I pasted look normal for long read alignment?

  1. Nothing changes when I ran it without -q 20...
  2. Sizes are: reads (long reads) in fastq- 67,773,498 KB genome assembly in fasta- 72,934 KB SAM file- 119,878,001 KB

I think its not a problem of memory, or else it would stop running no? Maybe I'm mistaking (once again I'm very new to this). Thank you for your patience and help!! Sincerely, Rachel

isovic commented 3 months ago

Hi Rachel!

Thanks for getting back! Can you also copy/paste the output of Racon from your terminal if you still have it, so I can see where it is hanging? Did it reach the step with the sliding arrow that says "generating consensus"?

The SAM file you sent looks normal, thanks for sharing it!

Racon will try to load all reads into memory at the beginning of the process. Your FASTQ file is ~64GB large and the input SAM file is ~114GB in size. How much memory does your machine have available? If your machine doesn't have this much RAM (actually, somewhat more than this because of data structures and algorithms), the process could get killed by the system, or it could go into swap and become extremely laggy to the point that it looks frozen. In this case, the solution is to separate contigs from your reference file into multiple files, and extract only the alignments from the SAM file which map to each contig group, and then run them separately. (This can be scripted.)

Still, this does not explain why the small test example doesn't produce any output for you. Can you rerun it with /usr/bin/time so we can record the exit status? For example, like this:

minimap2 -ax map-pb -t 20 --cs=long test/data/sample_layout.fasta.gz test/data/sample_reads.fastq.gz > aln.sam
/usr/bin/time --format="cmd: %C\\nreal_time: %e s\\nuser_time: %U s\\nsys_time: %S s\\nmax_rss: %M kB\\nexit_status: %x" -o out.memtime \
racon -m 8 -x -6 -g -8 -w 500 -t 14 -q 20 -u test/data/sample_reads.fastq.gz aln.sam test/data/sample_layout.fasta.gz > out.fasta

After this, please copy the out.memtime contents and the stderr output from your terminal (the output from Racon).

Best regards, Ivan.

RacheliHadjez commented 3 months ago

Hi Ivan! Sadly I could not run it, and then we had a problem with the linux cluster at our university so finally I managed to run it on "galaxy" website and it gave me a great output! Thank you so much for your help!!! Rachel

isovic commented 3 months ago

Glad to hear it's resolved!

Best regards, Ivan.