merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

Functional homogeneity index is confused #1265

Closed meren closed 4 years ago

meren commented 4 years ago

I was looking at @mooreryan's unit test (#1264) and realized that the functional homogeneity index is a bit confused :) @mahmoudyousef98, the functional homogeneity should be 1.0 for all these cases:

compute_functional_index(['A-A', 'A-A', 'A-A'])
1.0

compute_functional_index(['AAA', 'AAA', 'A-A'])
0.77

compute_functional_index(['-AA', 'A-A', 'AA-'])
0.33
meren commented 4 years ago

The geometric homogeneity should differ for these, not functional.

mooreryan commented 4 years ago

The way gaps are handled was actually the reason I went through the whole unit testing exercise. In http://merenlab.org/2016/11/08/pangenomics-v2/#functional-homogeneity-index it mentions gaps are ignored in the functional index. Should it be that any pair of residues in which one is a gap is ignored or any pair of two gaps is ignored? (See this test: https://github.com/merenlab/anvio/blob/1238cbb19f4072ac4b9a5816afb9d1720af12d25/anvio/tests/unit/test_homogeneityindex.py#L88 for more on gap behavior)

mooreryan commented 4 years ago

I think I found another bug in the functional index. Take a look at this pull request: https://github.com/merenlab/anvio/pull/1269.

It's a unit test that shows that the order in which an ambiguous residue is compared affects the functional index/score.

mahmoudyousef98 commented 4 years ago

I just saw this issue. Thank you for fixing the original issue. I'm working on the other bug