rsonthal / TreeRep

Learning Tree structures and Tree metrics
GNU General Public License v3.0
23 stars 1 forks source link

Negative distance output by TreeRep #6

Closed elichienxD closed 2 years ago

elichienxD commented 2 years ago

Hi @rsonthal ,

I found that when the input metric is the distance of many random points (n=128) in a 3d unit ball, then TreeRep can often return a tree with negative edge weights. Is this an implementation bug?

Also, I found that in src/Tree Rep.ipynb, image

Does that mean TreeRep is known to produce negative edge weights? I think simply remove those negative edge is not reasonable, as they will induce infinite distance between leave nodes right?

Below is the minimum working example (in Python) to reproduce my issue. Please let me know if I misunderstand something.

import TreeRepy
import numpy as np

def generate_rand_Emetric_debug(num_nodes,E_rank=3):
    X = np.random.normal(size=(num_nodes,E_rank))
    X = X/np.linalg.norm(X,axis=1,keepdims=True)*np.random.rand(num_nodes,1)
    D_metric = np.zeros((num_nodes,num_nodes))
    for i in range(num_nodes):
        for j in range(i):
            D_metric[i,j] = np.linalg.norm(X[i,:]-X[j,:])
            D_metric[j,i] = D_metric[i,j]
    return D_metric

np.random.seed(0) # This is for reproducibility
A = generate_rand_Emetric_debug(128)
A = np.around(A,4) #This ensure the min dist difference in A is greater than tol=1e-5
print(A[A<0])
W = TreeRepy.TreeRep(A)
print(W[W<0])
rsonthal commented 2 years ago

Unfortunately if the data is far from being a tree then the algorithm might produce negative weights. I think this is an algorithmic issue if the data is far from a tree not an implementation bug, but I am not so sure right now. I would need to work through the math again to be sure.

Here, it doesn't remove those edges, it instead sets those edge weights to 0. The tree structure is stored in G2 and that still has the edge (at least in the Julia code).

elichienxD commented 2 years ago

Hi @rsonthal ,

After some research on neighbor joining methods, I find that it is a common practice to replace negative edge weights (branches) to 0. However, in theory we should transfer the difference to the adjacency branches. See here.

I wonder if your method is also the same as NJ based method? Is it also okay to simply replacing negative weights with 0?

Thanks, Eli

rsonthal commented 2 years ago

I agree that setting the weights to 0 (during the algorithm) seems to be an okay thing to do, as would distributing the edge weight. I am curious to how that would effect the algorithm.

So my method has some similarities with NJ, mine is top down rather than bottom up like NJ, but I think it functions on similar concepts.