sicara / easy-few-shot-learning

Ready-to-use code and tutorial notebooks to boost your way into few-shot learning for image classification.
MIT License
1.07k stars 147 forks source link

Evaluation method #124

Closed wjtan99 closed 1 year ago

wjtan99 commented 1 year ago

Problem Explore few-shot learning in an image classification, for example, bird species classification

Considered solutions NA

How can we help Be as clear and concise as possible so we can help you in the most efficient way.

Thanks a lot for your excellent code and tutorial.

I just went through the first tutorial my_first_few_shot_classifier.ipynb and have the following question. I am still new to meta learning, although I am experienced in other machine learning fields.

In your dataloader, N(K+Q) samples are randomly sampled in every episode. In the training process, the accuracy is calculated. I understand this is fine in the training process. But from the application prospective, when people do not care if your use N-shot learning or other methods, they care about the overall accuracy, like in an image search or retrieval task. By saying that, after the learning is done, the final evaluation metric should not limit to N(K+Q) images per batch (or episode). More importantly the query images should not be limited to from the same support set classes.

Is my understanding reasonable? Or is that how standard meta learning is evaluated?

wjtan99 commented 1 year ago

I read a few papers and codes on CUB200-2011 with fine grained image classification, metric learning, and few-shot learning https://paperswithcode.com/dataset/cub-200-2011. I understand now that these different tasks just use different evaluation metrics. So I think your code is fine in the few shot learning evaluation. Thank you. You have done an excellent work putting so many stuffs in one place.

ebennequin commented 1 year ago

Thank you! I share your opinion on the limitations of the evaluation process in Few-Shot Learning. I published a paper a while ago on this issue: Few-Shot Image Classification Benchmarks Are Too Far From Reality: Build Back Better With Semantic Task Sampling