ms609 / TreeDist

Calculate distances between phylogenetic trees in R
https://ms609.github.io/TreeDist/
28 stars 5 forks source link

The Treedist distance matrix output of Generalized RF and Nye et al. methods are zero #102

Closed Jigyasa3 closed 1 year ago

Jigyasa3 commented 1 year ago

Dear @ms609,

Thank you again for the detailed manual and explanation of the methods! I do have a question on ClusteringInfoDistance() function and NyeSimilarity() functions.

I am running the following functions-

for distance matrix-

tree1<-read.tree(file="hosttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk")
tree2<-read.tree(file="symbionttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk")
tree1<-unroot(tree1)

#GRF
dist_rf <- ClusteringInfoDistance(tree1, tree2, normalize = TRUE)

#Nye
dist_ny <- NyeSimilarity(tree1, tree2, normalize = TRUE ,similarity = FALSE)

for p-values-

#GRF
nRep <- 100000 # Use more replicates for more accurate estimate of expected value
randomTrees <- lapply(logical(nRep), function (x) RandomTree(tree1$tip.label))
randomDists <- ClusteringInfoDistance(tree1, randomTrees, normalize = TRUE)
expectedCID <- mean(randomDists)

dist12 <- ClusteringInfoDistance(tree1, tree2, normalize = TRUE)
# Now count the number of random trees that are this similar to tree1
nThisSimilar <- sum(randomDists < dist12)
pValue <- nThisSimilar / nRep

#Nye-
nRep <- 100000 # Use more replicates for more accurate estimate of expected value
randomTrees <- lapply(logical(nRep), function (x) RandomTree(tree1$tip.label))
randomDists <- NyeSimilarity(tree1, randomTrees, normalize = TRUE,similarity = FALSE)
expectedCID <- mean(randomDists)

dist12 <- NyeSimilarity(tree1, tree2, normalize = TRUE,similarity = FALSE)
# Now count the number of random trees that are this similar to tree1
nThisSimilar <- sum(randomDists < dist12)
pValue2 <- nThisSimilar / nRep

I am getting a zero distance matrix and p-value outputs for the trees attached. Tree1-https://github.com/Jigyasa3/errors/blob/master/hosttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk and Tree2- https://github.com/Jigyasa3/errors/blob/master/symbionttree-d__Bacteria_p__Desulfobacterota_COG0215_tips_1.nwk. The two trees are completely identical to each other, yet the value of the distance matrix is 0. Why do you think that's happening?

Looking forward to your reply!

ms609 commented 1 year ago

If the trees are identical (notwithstanding edge lengths, which are ignored), then they should have a distance of zero, and a high similarity. Distance increases as trees become less similar.

Or have I misunderstood your question?

Jigyasa3 commented 1 year ago

Dear @ms609 ,

Thanks for a quick reply! May I please confirm again that if I use the function ClusteringInfoDistance() with normalize = TRUE then lower the value of the distance means higher similarity?

Similarly, if I use the function NyeSimilarity() with normalize = TRUE , similarity = FALSE then lower the value of the distance also means higher similarity?

The normalize = TRUE would help to compare the two distances with each other so that I can say that if the ClusteringInfoDistance() gives me zero while NyeSimilarity(similarity=FALSE) gives me 0.2 then the two trees are very similar to each other via both the methods.

#GRF
dist_rf <- ClusteringInfoDistance(tree1, tree2, normalize = TRUE)

#Nye
dist_ny <- NyeSimilarity(tree1, tree2, normalize = TRUE ,similarity = FALSE)
ms609 commented 1 year ago

Yes, that's right: you need the similarity = FALSE argument to ask NyeSimilarity() to return a distance (i.e. difference); then the interpretation of the two distance measures is equivalent (though the absolute values will differ).