yangao07 / abPOA

abPOA: an SIMD-based C library for fast partial order alignment using adaptive band
MIT License
111 stars 18 forks source link

Consensus coverage in diploid mode #29

Open ldenti opened 2 years ago

ldenti commented 2 years ago

Hi all, first of all: great library!

I was playing around with diploid mode and I ran into strange segmentation faults (when freeing cons_cov). From what I could see, in diploid mode, cons_cov is not used at all: https://github.com/yangao07/abPOA/blob/637c79fce8e703d2bb3ea1964009de9bec7d3430/src/abpoa_graph.c#L817-L818 So, I'm assuming that it's not possible to get consensus coverage in diploid mode. Is this true?

Thanks, Luca

yangao07 commented 2 years ago

You are right. The diploid mode is not working very well as I expected, so I did not add it to the release of abPOA. Right now, it only can provide a two-consensus result and not furhter functions were applied in there.

By the way, what is your senario of using this diploid mode? Does the result look good to you? I may continue to update this function at some time, if not recently.

Yan

ldenti commented 2 years ago

Ok, thanks!

I'm working on genotyping SVs and, at least from the few examples I opened in IGV, it seems to work quite well: when I clearly see two alleles in the reads, I get two consensus.

Since the weights of each consensus are quite important to me, is it possible (using the current available functions) to get the heaviest weighted consensus (in "aploid" mode) and then get a second heaviest consensus? Do you see any easy way to do this?

Luca

yangao07 commented 2 years ago

I will try to work on that, will get back to you when I have any updates.

Yan

ldenti commented 2 years ago

Oh, great! Thanks!

Best, Luca

yangao07 commented 2 years ago

Hi Luca,

Sorry that I didn't update for such a long time. Do you have any diploid example/test data? I am updating the diploid mode of abPOA recently, any data will be very helpful!

Thanks, Yan

ldenti commented 2 years ago

Hi Yan, I just extracted these two small fastas from some reads I'm working on (hete-examples.zip): I extracted portions (substrings) of the reads covering potential heterozygous events (a deletion and an insertion).

In the zip you can also find the IGV screenshots of the two regions I considered (the alignments clearly show some differences between the haplotypes).

Let me know if they are a good starting point or if you need anything else.

Best, Luca

yangao07 commented 2 years ago

Hi,

Just pushed the latest version to github, please try out the multiple consensus sequences mode: set -d/--max-n-cons as the desired value. Also, all the relevant variables including cons_cov are properly set in this mode. See abpoa.h for more details.

https://github.com/yangao07/abPOA/blob/bfe4ac0a4945ed3eadf68282776fc816b299947e/include/abpoa.h#L101-L111

Yan

ldenti commented 2 years ago

Thanks Yan!! I'll check it out.

Best, Luca