yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
118 stars 18 forks source link

Incremental update of the graph #16

Open huguesrichard opened 3 years ago

huguesrichard commented 3 years ago

Hello,

Thank you very much for abPOA, it is a nice tool, installation was fast and easy.

I would like to be able to incrementally update the POA graph. That would be very practical to use the graph as a compressed aligned version of the sequences and to add sequences to it as they accumulate over time.

Typical use case would be to first generate a graph (for instance in gfa format) and then to be able to add sequences to this graph with additional commands. For instance with an --increment option:

abpoa -r 3 seqs.fa > graph.gfa

abpoa --increment newseqs.fa graph.gfa > newgraph.gfa

Best regards,

Hugues

yangao07 commented 3 years ago

This is theoretically doable, I will give it a try. I will post the updates here when it's ready.

Yan

huguesrichard commented 3 years ago

Hello again,

As a complement to my previous request, the gfa graphs produced by abPOA are usually quite huge and it was easy to transform it to unitigs using Heng Li's gfatools, e.g. gfatools asm -u graph.gfa > graph_unitig.gfa`

That would be great if graph_unitig.gfa could be provided to abPOA as input. The graph can then practically be used as a database for short sequences.

Hugues

yangao07 commented 3 years ago

Thanks for the suggestion! I will try to add this feature in the next version.

Yan

yangao07 commented 3 years ago

@huguesrichard Please try out the latest abPOA v1.1.0. It now can incrementally align sequences to an existing GFA or MSA. Let me know if this works for you.

Yan

huguesrichard commented 3 years ago

Hello @yangao07,

I tried adding sequences to a gfa produced by apPOA and this worked directly. That's really a great feature, thank you!
I will try it out a little more in the next days and let you know if I see anything strange on the resulting MSAs.

I also tried with a gfa simplified to unitigs (using Heng Li's gfatools) but in this case abPOA did not recognise the gfa file.

Also, that would be great to have a few information messages printed to stderr as abPOA runs. I am running it on a few thousand sequences now and I am always unsure where it in in terms processing the files.

Anyway, thanks again for adding the feature

huguesrichard commented 3 years ago

Also, I could not get access to the release, I guess it was not published yet.

yangao07 commented 3 years ago

Also, I could not get access to the release, I guess it was not published yet.

I haven't pushed it to the release yet.

I also tried with a gfa simplified to unitigs (using Heng Li's gfatools) but in this case abPOA did not recognise the gfa file.

The unitigs by gfatools have no P lines, which are required for incremental graph alignment, that is why it is not supported. On the other hand, I think it is not hard for abPOA to output GFA with unitigs. I will try to add this feature.

huguesrichard commented 3 years ago

a graph output with unitig would be really really helpfull. From my small tests on viral genomes I had around 50-fold compression generating the unitig version.