mespadoto / proj-quant-eval

16 stars 3 forks source link

Can't reproduce the metrics results #3

Closed BeiJohann closed 2 years ago

BeiJohann commented 2 years ago

I could not get the whole project working, and so I took parts of it to measure my projections. I downloaded the .npy files for the datasets and used the same PCA from scikit-learn. But my results differ from yours for normalized stress and shepard goodness. I looked in your code for hours to find the mistake, but I can not find any. I double-checked everything. The projection is the same, the dataset, the projected result and the code for the metrics are the same, but I get different results. How is this possible? Mathematically, should the results be the same, but they aren't. Did you use the Datasets uploaded to the Website? Is the data changed somewhere, or the results?

def compute_distance_list(X):
    return spatial.distance.pdist(X, 'euclidean')

def compute_distance_matrix(X):
    D = spatial.distance.pdist(X, 'euclidean')
    return spatial.distance.squareform(D)

def metric_pq_shepard_diagram_correlation(X_high, X_low):
    D_high = compute_distance_list(X_high)
    D_low = compute_distance_list(X_low)
    return stats.spearmanr(D_high, D_low)[0]

I do exactly the same above, but get for the “bank” dataset 0.5329917367864474 and not 0,766496244905907 as you. Or for the “cifar10” I get 0.7688356643971043 and not your 0,884418301185305. Did I miss something?

from sklearn.decomposition import PCA
pca = PCA(n_components=2, random_state=42)
result = pca.fit_transform(data)

This is my PCA

data = np.load(dataRootPath + dataName + '/X.npy', mmap_mode='c')
label = np.load(dataRootPath + dataName + '/y.npy', mmap_mode='c')

This how I load the data.

mespadoto commented 2 years ago

I believe the differences you are seeing are due to a rescaling of the metrics' values to the interval [0,1], which was done considering all results. This rescaling was done to be able to compute the aggregated metrics in the paper.

If you simply want to use this code to compute metrics to your own data and/or projections it should work just fine, but since this version is tightly coupled with the evaluation code it'll probably be easier to use the code in this link. You'll have to rescale the values for all variables in the dataset to the interval [0,1].