Open jeslago opened 6 years ago
I have also compared the Shogun implementation with the implementation that you provided in the summer school to be used with Tensorflow. I still get inconsistent results:
import tensorflow as tf
import mmd
data = np.load("blobs.npz")
X = data["X"]
Y = data["Y"]
sigma_median=5.079463545341823
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
print("MMD and ratio:", sess.run(mmd.rbf_mmd2_and_ratio(X,Y, sigma=sigma_median)))
feat_p=sg.RealFeatures(X.T.astype(np.float64))
feat_q=sg.RealFeatures(Y.T.astype(np.float64))
kernel=sg.GaussianKernel(2 * (sigma_median)**2)
mmd=sg.QuadraticTimeMMD(feat_p,feat_q)
mmd.set_kernel(kernel)
mmd.set_statistic_type(sg.ST_BIASED_FULL)
print(mmd.compute_statistic())
I think Shogun returns a scaled version of the MMD statistic (Note that this does not matter if you do the test as either the permutation test is used, or the spectral test is scaled appropriately).
A better way to compare would be to compute the p-value of the statistic and compare that ... and it should be similar (more similar for more permutations)
@lambday what does the compute_statistic
call compute exactly? The docs do not reflect it http://shogun-toolbox.org/api/latest/classshogun_1_1CQuadraticTimeMMD.html
Thanks for the answer. Indeed, the p-values are actually more similar, so I guess the MMD statistic from shogun is somehow scaled. It would be good though to know the scaling of it!
hey @jeslago, the MMD statistic in Shogun (the one that the method compute_statistic
returns) is \frac{n_x\timex n_y}{n_x + n_y}\times MMD^2 estimate, where n_x is the number of samples from P and n_y is the number of samples from Q. You can check the API doc for CMMD class (link below) to see for more details.
HTH :)
[CMMD class api doc] http://www.shogun-toolbox.org/api/latest/classshogun_1_1CMMD.html
Hi,
I was one of the attendant to the ds3 summer school. I was trying to go over the things that we learnt and repeat them using shogun (my goal is to use hypothesis testing via shogun for my own research). However, I cannot get the same results in shogun and in the code developed at DS3.
Something as simple as computing the MMD metric outputs different results using shogun w.r.t. using the MMD implementation of the summer school. I can show that with the following example:
Any guess on what might be happening? The MMD implementation should be the same in the toolbox and in the code. Might it be the kernel?
Edit: I have tried to use linear kernels and I still get different MMD values. I used the linear_kernel method from the summer school).