rlorigro / GFAse

Tool for globally phasing diploid assembly graphs with orthogonal data
Mozilla Public License 2.0
36 stars 4 forks source link

Produce correct phased FASTA files when using the default chaining algorithm #27

Closed jeizenga closed 4 months ago

jeizenga commented 4 months ago

The FASTA output for the Hamiltonian chaining algorithm was pretty badly broken. The root cause was that we used the chainer's collection of path handles to identify which nodes should go into the two haplotype FASTAs. However, these path handles were invalidated by the GFA unzipping algorithm, which destroys the old paths to make new ones consisting only of the unzipped node. The hand-off between these phases of the algorithm is now implemented in terms of the path names instead of path handles, since these remain stable.

This fix works as a patch, but IMO it would be better if we refactored things so that the chainer's memory wasn't needed for the FASTA-writing phase. The current implementation is pretty ugly and opaque.

Resolves https://github.com/rlorigro/GFAse/issues/23