pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

Fix position to print all paths for all bed regions #453

Open ASLeonard opened 2 years ago

ASLeonard commented 2 years ago

Fixes a presumed error reported in #440 where coordinates in a bed file are not translated into every path. Now given P paths in a graph and N lines of bed file coordinates, there is P * N lines of output.

This is a fairly ugly fix by iterating get_position with an additional parameter to pick out each path, but is the best I can do given my understanding of handles. The change to get_position should not affect other calls to it, and I've tested it with both the -r <path> (unchanged behaviour) and -R <paths_file> (now fixed behaviour). I haven't looked at the other forms of input or liftover, so this is probably incomplete.

For a ~800 megabyte og file with 12 paths, I observed a 5x speed up (100 seconds old, 20 seconds new) with this fix versus having to call odgi position -r <path> in a for loop with a 5 line bed file. I assume loading the graph is takes a decent amount of time, so it is a big advantage to have this work internally.