Training on unseen classes; Moving to mini-Imagenet?

Vinyalis et al. introduce the mini-ImageNet dataset as a

image downsampling (84x84)
number of classes reduction (100)
number of per-classes images reduction (600)

with respect to regular ImageNet. Some authors (I actually only know about Larochelle doing it) have adopted this as a standard of sorts for one-shot learning tasks. Should we move to this? One strong reason to do this is to enable better comparisons with one-shot learning literature: the CNN model should have not seen the classes you are performing one-shot learning on (see note 1); so we should probably train our models again if we want to compare to those papers. As previously discussed, this is quite useless as of now as we are not doing one-shot learning, but it is a direction I would be interested in.

Notes 1: this is interesting to me, as it feels that the similarity between the classes the base model has been trained on and the classes we are evaluating on could be a major driving force of the final one-shot accuracy performance. What about the following experiment:

average the L2 distance between CNN features for a given class.
the one-shot model is doing way better on the closest classes, no matter the one-shot model one is using. point being, the CNN features are still doing the most of the weight-lifting. or maybe some one-shot models contradict this assumption, which would be kind of interesting!

A general note: one-shot models have not the same level of "software maturity" as other deep models do. Probably hard to find them/have to move to LUA to compare them (the one-shot LSTM stuff by Larochelle is in regular torch as an example) .

ml-unito / DeepWordLearning

Training on unseen classes; Moving to mini-Imagenet? #5