Closed henrivkgt closed 1 year ago
I can reproduce the issue. To me, it is triggered when two intervals share the same end coordinates:
echo -e "x\t1\t2\tA\nx\t1\t2\tB"
x 1 2 A
x 1 2 B
odgi inject -i test/t.gfa -b <(echo -e "x\t1\t2\tA\nx\t1\t2\tB") -o - | odgi view -i - -g | tail -n 3
P x 1+,2+,3+,5+,7+,8+,10+,11+,13+,14+,16+,17+ *
P A *
P B 2+ *
echo -e "x\t1\t2\tA\nx\t1\t2\tB"
x 1 2 A
x 1 3 B
odgi inject -i test/t.gfa -b <(echo -e "x\t1\t2\tA\nx\t1\t3\tB") -o - | odgi view -i - -g | tail -n 3
P x 1+,2+,3+,4+,6+,8+,9+,11+,12+,14+,15+,17+,18+ *
P A 2+ *
P B 2+,3+ *
The issue is that paths are created, but no steps are added to them.
I checked with my graph and indeed it only happens when two bed entries share the same end coordinate.
Thank you for confirming that!
Still investigating, and things are going worse.
cat t.gfa
H VN:Z:1.0
S 1 ATCG
P x 1+ *
odgi inject -i t.gfa -b <(echo -e "x\t1\t2\tA\nx\t1\t3\tB") -o - | odgi view -i - -g
H VN:Z:1.0
S 1 A
L 1 + 2 + 0M
S 2 T
L 2 + 3 + 0M
S 3 C
L 3 + 4 + 0M
S 4 G
P x 4-,3-,2-,1- *
P A 3- *
P B 3-,2- *
odgi inject -i t.gfa -b <(echo -e "x\t1\t2\tA\nx\t1\t3\tB") -o - | odgi paths -i - -f
>x
CGAT
>A
G
>B
GA
Confusing (@ekg):
cat t.gfa
H VN:Z:1.0
S 1 ATCG
P x 1+ *
odgi view -i t.gfa -g
H VN:Z:1.0
S 1 ATCG
P x 1- *
The last two strange behaviors were triggered by the presence of spaces instead of a TAB between 1+ and *, so it is all good(?).
@henrivkgt, I think I've fixed the issue in https://github.com/pangenome/odgi/pull/476. Could you please give it a try?
I built the odgi from that branch, but running the command with this build gives me an error:
[odgi::algorithms::inject_ranges] injection end point for interval 97103_geno_T0000001_exon_1 is not at a node boundary: off by 2305 vs 5796 on a node 260bp long
The offending line seems normal:
97103_genome_v2.5~1~1 2024 2305 97103_geno_T0000001_exon_1
with odgi from the release the command finishes without errors (but with the path bug)
Thank you for the prompt check. So there is still work to do.
Just in case it might be useful, could you please share your GFA and BED, or small parts of them that trigger the error?
watermelon6_100kb.tar.gz Sure, I made a small example that triggers the error for me
I debugged a bit more and catched the error. I've tested the update also with your example too. Now it should be the good one!
Seems good now. Thanks for fixing this!
Thank you for your feedback!
Hello,
Recently I have run into an issue with odgi inject when trying to inject genes and exons as paths in a graph. When running odgi inject using a bed file with overlapping coordinates (for instance when adding two transcript variants of the same gene), only the second entry is added correctly. The first entry can still be found in the graph when running odgi paths, but it is given a length of 0, and using odgi extract on the corresponding region does not extract the path. Is this intended? My current workaround is to separate the bed file into multiple files with only non-overlapping entries, then running odgi inject on all of them adding the paths recursively to the graph, but it is a bit inefficient as the graph has to be completely written to disk every time.
Best, Henri