Closed CompRhys closed 5 years ago
Can you provide an example of the input and the current and desired output? Currently, the functions only allow to pass instances from two random vectors. I was trying to implement pairwise computation of these measures (look at the develop branch), but it is not publicly available right now, and I intended to use a separate function for that, because I think it is more clear that way.
Sure, I think the issue is that I don't follow what you mean by instances of random vectors
import numpy as np
import dcor
a = np.array([1, 2, 3, 4])
b = np.array([5,8,6,2])
c = np.column_stack(a,b) # i.e. a (4,2) matrix
so for dcor.distance_correlation(a,a)
we'd expect 1.0 and for dcor.distance_correlation(a,b)
I get 0.795. For `dcor.distance_correlation(a,c)' I'd expect back the vector [[1.0] [0.795]] but I instead get a single scalar 0.886
distance_correlation
interprets those as follows:
a
and b
both contain 4 evaluations of a random variable.c
contains 4 evaluations of a random vector, with 2 elements.
Thus distance_correlation(a, c)
is well defined, as distance correlation is defined even for two random vectors with different dimensions, and the result is a single number.ahh okay now I see, thanks! I hadn't really thought about the fact that we could have vectors with different dimensions due to the distance matrix being constructed from the norms and that's what was confusing me.
A pairwise implementation would be good but I can just refactor my code to use dcor.distance_covariance to stop the redundant calculation of dvar(Y) when iterating over arrays of random variables
dcor returns a scalar for the distance correlation of a matrix and a vector. I cannot yet understand why this is the case as isn't the distance correlation defined between two vectors and so I would expect a vector of the correlations as the output.
Could you explain what's going on?