Closed srujan741 closed 4 years ago
First, if you are obtaining such high p-values for clearly distinct distributions, maybe there is a bug in the code or maybe you are calling the method with wrong parameters, because that should not happen. Can you provide an example of how are you using the method?
As for the explanation and understanding, the complete procedure is explained in the original article of Székely and Rizzo.
I will summarize the method:
energy_test_statistic
in the code) between two samples converge if the samples have the same distribution but tends to infinity (when the size of the samples grow) if they have different distributions.num_resamples
). We then compare the statistics obtained with the original one, obtaining the proportion of statistics larger than the original. This proportion is the estimated p-value.I will close this as there is no answer from @srujan741.
I have an experiment wherein i have two groups of customers with the same attributes. I wanted to do a multivariate homogeneity test for this and used the dcor.homogeneity.energy_test() method on both the groups. My question is that i always end up with a p value of 1 or close to 1. I simulated a 2 d dataset in two cases a.) There are two distinct clusters seperated b.) The data clusters are overlapping. The p value in both the cases came out to be 1 although the test statistic value was different. I want to understand how the homogeneity test works? Help is much appreciated.