pangenome / odgi

Optimized Dynamic Genome/Graph Implementation: understanding pangenome graphs
https://doi.org/10.1093/bioinformatics/btac308
MIT License
196 stars 40 forks source link

New `odgi similarity` command #498

Closed AndreaGuarracino closed 1 year ago

AndreaGuarracino commented 1 year ago

We move part of the odgi paths functionalities in a dedicated command, odgi similarity.

AndreaGuarracino commented 1 year ago

By default, we emit similarity measures:

odgi similarity -i MHC.og -D '#' -p 2 | column -t | head -n 5 

group.a     group.b     group.a.length  group.b.length  intersection  jaccard.similarity  cosine.similarity  dice.similarity  tanimoto.similarity  estimated.identity
chm13#1     chm13#1     3316253         3316253         3316253       1                   1                  1                1                    1
mSymSyn1#2  GRC38#1     3087908         3366997         1743857       0.370163            0.540826           0.54032          0.370163             0.54032
mPonPyg2#2  GRC38#1     3780557         3366997         2296869       0.473514            0.643779           0.642701         0.473514             0.642701
mPonAbe1#2  GRC38#1     3669057         3366997         2322227       0.492642            0.660703           0.660094         0.492642             0.660094

With -d/--distances, we emit dissimilarity measures plus Euclidean and Manhattan distances:

odgi similarity -i MHC.og -D '#' -p 2 --distances | column -t | head -n 5

group.a     group.b     group.a.length  group.b.length  intersection  jaccard.distance  cosine.distance  dice.distance  tanimoto.distance  estimated.difference.rate  euclidean.distance  manhattan.distance
chm13#1     chm13#1     3316253         3316253         3316253       0                 0                0              0                  0                          0                   0
mSymSyn1#2  GRC38#1     3087908         3366997         1743857       0.629837          0.459174         0.45968        0.629837           0.45968                    1722.55             2967191
mPonPyg2#2  GRC38#1     3780557         3366997         2296869       0.526486          0.356221         0.357299       0.526486           0.357299                   1598.07             2553816
mPonAbe1#2  GRC38#1     3669057         3366997         2322227       0.507358          0.339297         0.339906       0.507358           0.339906                   1546.48             2391600