mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
771 stars 167 forks source link

polishing assembly from other tool #278

Closed dcopetti closed 4 years ago

dcopetti commented 4 years ago

Hello, I would like to use Flye to polish an assembly I got with NECAT, but I get an error. Here is what happens:

$ time /usr/local/Flye_2.7b-b1526_Andreas/bin/flye --polish-target Tims_2kb_200603_all_sl.fa --nano-raw reads_2kbq7a.fq.gz --iterations 2 --out-dir polish_tims_necat_200624 --threads 44
[2020-06-24 18:25:00] INFO: Running Flye polisher
[2020-06-24 18:25:00] INFO: Polishing genome (1/2)
[2020-06-24 18:25:30] INFO: Running minimap2
[2020-06-24 18:26:33] INFO: Separating alignment into bubbles
[2020-06-24 18:27:29] INFO: Alignment error rate: 0.000000
[2020-06-24 18:27:29] INFO: No reads were aligned during polishing

real    2m29.700s
user    3m47.330s
sys     1m10.981s

The fasta has headers with complex structure:

>000000F-001-00 start=000155110:E end=004326704:B length=41483 size=4 identity=1.00 coverage=1.00
>000000F-001-01 start=000155110:E end=004326704:B length=42263 size=4 identity=0.97 coverage=0.82
>000005F-001-00 start=007743575:E end=003096594:B length=25256 size=4 identity=1.00 coverage=1.00
[...]
>000196F linear length=86460
>000197F linear length=23153
>000198F linear length=41043

for "bubbles" and "contigs" respectively. I wonder if that can be the reason for the bug, or what else it can be. Thanks,

Dario

mikolmogorov commented 4 years ago

Fasta headers should not affect the pipeline.

The log says that no reads were ended up aligned, which caused the error. This never happens with regular assemblies. I suggest to try to map reads manually with minimap2 to verify that there are any alignments of length at least 500bp and 75% identity.

dcopetti commented 4 years ago

I aligned the reads with minimap2, there are plenty of alignments. Restarted the polishing, it completed - maybe a typo somewhere?

Anyway, it is working now. Thanks!