Closed Dmitry-Antipov closed 3 years ago
Hello, the problem with the consensus algorithm (heaviest path) is that it can pick up long insertions on both graph ends if there is no branching. In Racon, we check the coverage of each base in the consensus sequence, and trim away bases on both ends if their coverage is low. You can get coverage from MSA if you are using spoa through command line, or if you are using it as a library inside your code you can pass a vector inside https://github.com/rvaser/spoa/blob/master/include/spoa/graph.hpp#L169.
Best regards, Robert
Great, thank you.
Hi, thank you for the useful tool.
I've noticed that sometimes consensus (with global alignment mode) tends to be the longest sequence from the set, even when it is clear that the "right answer" is not. For example, for the test below I receive TTATAGTATATATTATATAATATATAAATATAATATACATTAAT as an answer consensus sequence, regardless of scoring functions (tried default, edit distance, and some others - i.e. -e -1 -g -8 -l 1 -m 10 -n -8 ) or reads order. MSA itself looks OK. Moving to local alignment did not help also.
Do you have some recommendations how to overcome this issue?
Seems that this issue mostly happens when there is an insertion in the beginning of one of the sequences, i.e. in test below there is extra T on first position. Possibly in such case "correct" paths through POA graph are not scored?