Zero-shot Computer Vision - Draft Outline

mmhamdy commented 9 months ago

This is an early draft outlining the Zero-shot Computer Vision chapter. Below I'll give a brief overview of the chapter content. There is also a presentation slides for the main concepts with some (little😃) details available.

🔹 Introduction

This section will basically lay the ground for the rest of the chapter. Each subsection was initially a section of its own, but then we thought that it would be better to merge them together under one heading.

[x] On Generalization
[x] Zero-shot Learning (ZSL), History and Definitions
[x] Comparison With Other Techniques: This part aims to differentiate between zero-shot learning and some other methods such as Open Set Recognition (OSR), Domain Adaptation, and Out of Distribution (OOD) Detection
[x] Relationship with Transfer Learning: This part discusses how is zero-shot learning related to transfer learning, and differentiates between homogeneous and heterogeneous transfer learning. this will only discuss parts related to ZSL as there is already a transfer learning chapter.

🔹 Side Box: How Humans Recognize New Objects

This is a collapsable box for the interested reader about how humans are good at identifying new unseen objects and why this is not the same for machines.

🔹 Zero-shot Learning methods

[ ] Attributes and Descriptors: Discusses what are attributes and why the a need for them in ZSL.
[ ] Visual and Semantic Spaces: Discusses different embedding spaces for the visual and semantic data.
[ ] ZSL Baselines: Discusses the Directed Attribute Prediction (DAP) algorithm as a baseline for ZSL.
[ ] ZSL Algorithms: Discusses some selected zero-shot learning algorithms such as Embarrassingly Simple Zero-Shot Learning (ESZSL), Deep Visual-Semantic Embedding (DeViSE), Attribute Label Embedding (ALE), Structured Joint Embedding (SJE), and Semantic Autoencoder (SAE)
[ ] ZSL Benchmarks: Dicusses some ZSL benchmarks such as Animals with Attributes2 (AWA2), Caltech-UCSDBirds 200-2011 (CUB), Attribute Pascal and Yahoo (aPY), and Sun attribute database (SUN).
[ ] Evaluation: How zero-shot learning is evaluated, and the ZSLGBU Framework
[ ] ZSL vs. Generalized ZSL: What is Generalized Zero-shot Learning (GZSL), and how it is a more realistic version of plain zero-shot learning?

🔹 Zero-shot Learning with CLIP and friends

[ ] How is CLIP Different From Previous Approaches: CLIP has been introduced in previous chapters. Here, we will discuss briefly (I hope) the parts related to zero-shot learning.

🔹 Zero-shot Learning in Computer Vision

This part illustrates how zero-shot learning can be used in the context of many different computer vision tasks. CV tasks were introduced in previous chapters.

Zero-shot Object Recognition/ Image Classification

[ ] Methods
[ ] Code Example

Zero-shot Object Detection

[ ] Methods
[ ] Code Example

Zero-shot Instance Segmentation

[ ] Methods
[ ] Code Example

Other CV Tasks

Besides the three most common CV tasks mentioned above, in this section, we may discuss other interesting CV tasks in the ZSL context.

🔹 Advantages of Zero-shot Learning

This section discusses why zero-shot learning is important. This will span a paragraph or two at most.

🔹 Applications of Zero-shot Learning

This section aims to provide some real-world applications of zero-shot learning in a computer vision context. There are no specific ones yet.

🔹 Challenges and Limitations of Zero-shot Learning

[ ] Bias
[ ] Domain Shift
[ ] Hubness
[ ] Semantic Loss

🔹 Frontiers

This is a paragraph or a little bit more mentioning the current state-of-the-art and or recent experimental approaches in zero-shot learning.

🔹 Chapter Summary

This is another paragraph (or two 😅) that aims to condense the main ideas discussed in the chapter and Key Takeaways.

👩‍💻 Hands-on Notebook

This is a hands-on notebook that shows two things:

The implementation of a classic ZSL algorithm, for example, ESZSL from scratch.
The implementation of a ZSL pipeline using CLIP or another friend from scratch.

Notes

The chapter contents may seem overwhelming but we hope that we will get to a much shorter and dense version when we start working on the details.
The Algorithms part is the most volatile (high probability of change) one. There are a lot of ZSL algorithms out there and we are trying to choose a representative sample showing different approaches.
We will make sure that the ratio of plain ZSL : CV ZSL remains in the reasonable range.

Other Resources:

ATaylorAerospace commented 9 months ago

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

alperenunlu commented 9 months ago

Comprehensive content. This is just terrific.

Looking forward to this. 🚀 🚀 🚀 🚀 🚀

mmhamdy commented 9 months ago

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

Thanks, @ATaylorAerospace. We're still not sure which algorithm to use in the notebook but surely will keep that in mind.

mmhamdy commented 9 months ago

Comprehensive content. This is just terrific.

Looking forward to this. 🚀 🚀 🚀 🚀 🚀

Thanks, @alperenunlu for taking the time to read it.

lunarflu commented 9 months ago

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

mmhamdy commented 9 months ago

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

Yeah, of course. This is going to get much shorter bit by bit. The plan is to grow the chapter (async) and then work on pruning. Releasing section by section will also make the process much easier.

johko commented 9 months ago

This really is an extensive outline, thanks for all the work, it looks awesome. I thought I know some stuff about ZSL, but I could not have come up with that much material :smile:

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

mmhamdy commented 9 months ago

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

Yeah, we will take an inside-out approach and start from Zero-shot Learning in Computer Vision (which still needs an outline of its own, I think 😅) and then work on the rest of the sections once it is finished. The ZSL section will have its own share but we will keep it brief, kind of like introducing Q-Learning before talking about Deep Q-Learning in Reinforcement learning. The outermost chapters (Introduction, Advantages, Applications, and Frontiers) are mostly a couple of paragraphs long and won't take much of the chapter.

merveenoyan commented 9 months ago

@mmhamdy great outline! my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

mmhamdy commented 9 months ago

my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

It's too broad, I agree. But a lot of the non-CV-specific content will be pruned and condensed to just provide a smooth transition from plain ZSL to CV ZSL. We are starting with the Zero-shot Learning in Computer Vision section in order not to get overwhelmed by the other sections. Once finished, we will branch out from there and start working on the rest of the sections.

johko / computer-vision-course