natir / fpa

Filter of Pairwise Alignement
MIT License
44 stars 5 forks source link

Keeping only dovetails + assembling with miniasm #14

Closed RolandFaure closed 2 years ago

RolandFaure commented 2 years ago

Hi,

I am toying with fpa to see if it could improve my assemblies. However I observe a very strange behaviour, which I cannot explain: I produced an all-vs-all alignment of my reads with minimap, named alignments.paf. I wanted to see how keeping only the dovetails alignments affected my assembly. Thus I used fpa to keep only the dovetails : cat alignments.paf | fpa keep -d > alignments_dovetails.paf But then, when I assemble with miniasm I get an absurdly big and complex assembly, looking as if the reads were not merged together by miniasm (when using directly the file from minimap there is no such problem).

Did you already observe this behaviour with miniasm ? Do you know an alternative assembler that would accept an input file with exclusivey dovetails alignments?

Many thanks, Roland

natir commented 2 years ago

I assume your alignments_dovetails.paf have a very low coverage, between 1 and 2. Miniasm have a minimal coverage parameter, -c with default value at 3.

I think if you set this parameter 1 or 0 your assembly could be better.

If you can share data I could probably make a try and maybe found a bug.

RolandFaure commented 2 years ago

It does not seem to be a question of coverage. I have a read coverage of >50x, and changing the -c value of miniasm does not change the problem. Someone else told me he had the same problem, so it may be an artefact of miniasm. You can have a look at my data there : https://mega.nz/folder/DRl2zaQB#srVSRs-HYczu71sqrofOHQ (E. coli data of the SRA, a in silico mix of two strains).

Thanks a lot !

natir commented 2 years ago

Hi, I found time to work on your trouble.

I run this pipeline:

minimap2 -x ava-ont -t 12 nanopore.fq.gz nanopore.fq.gz > nanopore.paf 2> minimap2_keep.log
fpa -i nanopore.paf keep -d > nanopore_keep.paf 2> fpa_keep.log
miniasm nanopore_keep.paf > assembly_keep.gfa 2> miniasm_keep.log

To compare result I run a simple minimap2, miniasm assembly pipeline:

minimap2 -x ava-ont -t 12 nanopore.fq.gz nanopore.fq.gz > basic.paf 2> minimap2_basic.log
miniasm basic.paf > assembly_basic.gfa 2> miniasm_basic.log

And effectively assembly with just dovetail isn't clean as basic miniasm assembly.

In fpa publication I just filter internal match and small overlap (drop -i -l 2000) I didn't remember clearly but I think it's because miniasm need containment overlap in a step before filter it.

To check this I run this pipeline:

minimap2 -x ava-ont -t 12 nanopore.fq.gz nanopore.fq.gz > nanopore.paf 2> minimap2.log
fpa -i nanopore.paf drop -i -l 2000 > nanopore_drop.paf 2> fpa_drop.log
miniasm nanopore_drop.paf > assembly_drop.gfa 2> miniasm_drop.log

The drop method seems to produce an assembly nearer than basic assembly.

Drop method save less memory than keep method but it's seems miniasm really need containment overlap, to produce a good assembly:

405M nanopore.paf
222M nanopore_drop.paf
167M nanopore_keep.paf

I hope I'm clear and I answer your question.

Thank for your interest on fpa, feel free to close this issue or continue discussion if you have other question.

RolandFaure commented 2 years ago

Thank you very much for the time you took to look at my problem. The answer is very clear :-)