pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

`odgi extract` somehow keeps the node which needs to be removed #577

Open yeeus opened 5 months ago

yeeus commented 5 months ago

I also met this problem using odgi extract like #572 , odgi somehow keeps the node which needs to be removed in my bed file

I have a node 1435085 in the raw graph which is too long and so simple with the depth only two:

$ odgi position -i HLA_H_to_J.gfa -g 1435087 -I
#target.graph.pos       target.path.pos dist.to.ref     strand.vs.ref
1435087,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700,167958,+      0       +
1435087,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138,167964,+   0       +

$ odgi depth -i HLA_H_to_J.gfa -g 1435087
#node.id        depth   depth.uniq
1435087 2       2

which needs to be removed in my case, so I made a bed file containing the intervals with length longer than 1k and depth lower than 5:

$ grep -e  '^HG00096_hap1_chr6\|NA21487_hap2_chr6' HLA_H_to_J.gfa.depth_low_5_1k.bed | less
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       56842   58458
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       85426   87672
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       113506  114634
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       126078  127190
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       128886  131659
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       139423  141156
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       149926  153037
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       161924  168157
HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700       322952  324609
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    56832   58448
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    113497  114624
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    126068  127180
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    128876  131667
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    139431  141164
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    149934  153048
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    161935  168163
NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138    322992  326051

but when I extracted with the following command, I got:

$ odgi extract -i HLA_H_to_J.gfa -t12 -P -I -b HLA_H_to_J.gfa.depth_low_5_1k.bed -R HLA_H_to_J.gfa.path_to_fully_retain.txt -o HLA_H_to_J.gfa.remove_depth_low_5_1k.og
$ odgi position -i HLA_H_to_J.gfa.remove_depth_low_5_1k.og -g 1435085 -I
#target.graph.pos       target.path.pos dist.to.ref     strand.vs.ref
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:114634-326513,47290,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:127190-326513,34734,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:153037-326513,8887,+  0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:141156-326513,20768,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:131659-326513,30265,+ 0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:87672-326513,74252,+  0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:0-322952,161924,+     0       +
1435085,0,+     HG00096_hap1_chr6#0#HG00096_hap1_chr6#0[69137-58532926]:29676187-30002700:58458-326513,103466,+ 0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:127180-326553,34755,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:141164-326553,20771,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:114624-326553,47311,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:153048-326553,8887,+       0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:131667-326553,30268,+      0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:0-322992,161935,+  0       +
1435085,0,+     NA21487_hap2_chr6#0#NA21487_hap2_chr6#0[15186305-58426289]:14495585-14822138:58448-326553,103487,+      0       +

$ odgi depth -i HLA_H_to_J.gfa.remove_depth_low_5_1k.og -g 1435085
#node.id        depth   depth.uniq
1435085 15      15

it seems the node was not be removed...

And I have two questions:

  1. Why the node wasn't be removed?
  2. What's the meaning of depth.uniq in the output of odgi depth, which seems always equals with depth?

Best wishes!

zhangyixing3 commented 5 months ago

For the second question, depth.uniqis not equals with depth, One path may contain a node multiple times for PGGB gfa, in minigraph cactus gfa ,depth.uniqis equals with depth. My own understanding.

yeeus commented 5 months ago

OK, fine. The graph I used was generated by minigraph cactus, so depth.uniq always equals with depth. And maybe dear developers (@ekg @subwaystation) can provide another output representing the depth for only one haplotype. That means in the above case:

#node.id        depth   depth.uniq
1435085 15      15

will be

#node.id        depth   depth.uniq     *something*
1435085 15      15    *2*(only in HG00096_hap1 and NA21487_hap2)