rlorigro / GFAse

Tool for globally phasing diploid assembly graphs with orthogonal data
Mozilla Public License 2.0
36 stars 4 forks source link

Incorporate basic overlap awareness in the unzipping process #15

Closed jeizenga closed 1 year ago

jeizenga commented 1 year ago

The current unzipping algorithm naively trusts overlaps without checking them for mutual consistency, and it arbitrarily chooses one sequence or the other in the case of mismatches. This could be improved in the future, but I think this gets us off the ground.

This turned out to be a major refactor because overlaps were not currently being tracked. In essence, I changed the basic data type from HashGraph + IncrementalIdMap to HashGraph + IncrementalIdMap + Overlaps. This required me to touch most modules to ensure that the data was making it around everywhere that it might be needed.

Now that everything is in place, it should be relatively easy to modify the unzip algorithm to improve the consensus process if we decide that it's important.