vgteam / libbdsg

Optimized sequence graph implementations for graph genomics
MIT License
30 stars 6 forks source link

Overlays don't work well with haplotype paths #158

Open adamnovak opened 2 years ago

adamnovak commented 2 years ago

Right now, overlays like PackedPathPositionOverlay will index all the paths that for_each_path_handle() returns.

For backward compatibility, I have for_each_path_handle() omitting haplotype paths, at least for graphs like GBWTGraph where there are thousands of them, and you need to use the more advanced PathMetadata search methods to enumerate haplotype paths.

But this means that you can put a PackedPathPositionOverlay over a GBWTGraph, get a handle to a haplotype path by name, but then ask about positions on it when it hasn't actually been indexed during construction of the overlay. This in turn is going to break e.g. vg inject into a haplotype path.

The overlays need to be modified to handle haplotype paths. Either they need to be enumerated and processed (which is probably too slow), or they need to be detected and excluded (which is kind of useless because people probably want to be able to use them) or they need to be lazily indexed by the overlays (which might be inefficient but at least avoids indexing thousands of samples to inject into one).