isovic / racon

Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads. http://genome.cshlp.org/content/early/2017/01/18/gr.214270.116 Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/racon
MIT License
261 stars 48 forks source link

Assembly too small for polishing? #174

Closed JWDebler closed 3 years ago

JWDebler commented 3 years ago

Hi, I am trying to polish a mitochondrial assembly of just over 50.000 bp.

I keep getting this error though:

minimap2 AlKewell_mito.fasta ../nanopore/AlKewell.nanopore.all.fastq.gz > AlKewell.paf

racon -m 8 -x -6 -g -8 -w 500 ../nanopore/AlKewell.nanopore.all.fastq.gz AlKewell.paf AlKewell_mito.fasta > AlKewell_mito.racon.fasta

[racon::Polisher::initialize] loaded target sequences 0.000429 s
[racon::Polisher::initialize] loaded sequences 116.784445 s
[racon::Overlap::transmute] error: unequal lengths in target and overlap file for target AlKewell_mito!

I used the same command on the full genome assembly and that worked fine. Is my fragment too small? As it might be smaller than some of the contained reads.

rvaser commented 3 years ago

Hi Johannes, there are no boundaries on sequence length in Racon. Can you please paste the output of the following commands:

head -n 1 AlKewell.paf
tail -n +2 AlKewell_mito.fasta | wc

Best regards, Robert

JWDebler commented 3 years ago
head -n 1 AlKewell.paf
532abcbc-a198-494f-b540-cc61a136b2c5    723     15      710     -       AlKewell_mito   52512   8653    9351    504     704     60      tp:A:P  cm:i:76 s1:i:504        s2:i:0  dv:f:0.0373     rl:i:0

tail -n +2 AlKewell_mito.fasta | wc
    657     657   53826
rvaser commented 3 years ago

Are there carriage return (\r) characters in AlKewell_mito.fasta? Please run head -n 2 AlKewell_mito.fasta.

JWDebler commented 3 years ago

Wow, thanks. Yep, what a noob mistake. I exported the fasta from geneious which seems to have inserted invisible characters.

dos2unix fixed it.

Is this a problem of minimap or racon parsing the fasta? I will notify the geneious developers, as I now remember running into a similar problem in the past after exporting a fasta.

Cheers!

rvaser commented 3 years ago

The problem is that the parser in Racon does not remove whitespaces if the sequence is in multiple lines, I will have to fix this :)

rvaser commented 3 years ago

Fixed from 1.4.20 (maybe even sooner).