ITML accuracy worse than guessing

drjosephliu commented 4 years ago

I'm using the Omniglot dataset, which has 20 samples of 964 classes of images, each which is 1 x 105 x 105.

I'm embedding these samples down to 512 dimensions. So the final dataset has shape (964*20, 512).

To implement a 5-way one-shot task, for each one of the 964 classes, I create a support set of five images; the first one is another sample of the same class and the other four are from different classes. The tuples therefore consist of the query image with one of the support set images.

This is what it looks like below (I'm using preprocessor with indices):

# For each index, make a 5-way 1-shot task with four other randomly selected classes
indices = []
n_classes, n_examples, dim = train_embeddings.reshape(-1, 20, 512).shape
for i in range(n_classes):

    ex1 = rng.randint(n_examples)
    ex2 = rng.randint(n_examples)
    indices.append([i* 20 + ex1, i * 20 + ex2]) # First pair is from the same class

    # Remaining four pairs are from different classes
    for j in range(4):
        random_class = rng.randint(n_classes)
        while random_class == i:
            random_class = rng.randint(n_classes)
        ex3 = rng.randint(n_examples)
        indices.append([i * 20 + ex1, random_class* 20 + ex3])

labels = [1,-1,-1,-1,-1] * n_classes
indices, labels = shuffle(indices, labels)

itml = ITML(preprocessor=train_embeddings)
itml.fit(indices, labels)

When it comes to testing time, instead of using predict(), I use instead the score_pairs() function and return the index with the highest score. If it corresponds to the same index in the gold labels, then I return that as a correctly classified task:

pair_scores = model.score_pairs(encoded_pairs)
if np.argmax(pair_scores) == np.argmax(targets):
     return 1

With all this in mind, I'm getting a 5-way accuracy of 0%. 3-way comes out to 12% though so it's not completely aberrant, but still way worse than random guessing. Am I using this algorithm correctly?

wdevazelhes commented 4 years ago

Hi, @drjosephliu, thanks for reporting this interesting example, For ITML, according to the documentation the score_pairs function should return the distance between the two samples of the pairs (http://contrib.scikit-learn.org/metric-learn/generated/metric_learn.ITML.html?highlight=itml#metric_learn.ITML.score_pairs), so I think you should rather check the minimum, not the maximum score at test time It's true that it's a bit misleading, since we would expect a score to be high when the pair is of similar samples... I don't remember why we made such a decision but this example is good to remember in case we want to rethink the API I hope this helps

bellet commented 4 years ago

It is true that the term score is a bit misleading. But you should really use decision_function, which follows the convention of sklearn (the larger, the higher the probability of being positive)

scikit-learn-contrib / metric-learn

ITML accuracy worse than guessing #286