johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
389 stars 126 forks source link

Zero-shot Computer Vision - Draft Outline #43

Closed mmhamdy closed 3 months ago

mmhamdy commented 9 months ago

This is an early draft outlining the Zero-shot Computer Vision chapter. Below I'll give a brief overview of the chapter content. There is also a presentation slides for the main concepts with some (littleπŸ˜ƒ) details available.

πŸ”Ή Introduction

This section will basically lay the ground for the rest of the chapter. Each subsection was initially a section of its own, but then we thought that it would be better to merge them together under one heading.

πŸ”Ή Side Box: How Humans Recognize New Objects

This is a collapsable box for the interested reader about how humans are good at identifying new unseen objects and why this is not the same for machines.

πŸ”Ή Zero-shot Learning methods

πŸ”Ή Zero-shot Learning with CLIP and friends

πŸ”Ή Zero-shot Learning in Computer Vision

This part illustrates how zero-shot learning can be used in the context of many different computer vision tasks. CV tasks were introduced in previous chapters.

Zero-shot Object Recognition/ Image Classification

Zero-shot Object Detection

Zero-shot Instance Segmentation

Other CV Tasks

Besides the three most common CV tasks mentioned above, in this section, we may discuss other interesting CV tasks in the ZSL context.

πŸ”Ή Advantages of Zero-shot Learning

This section discusses why zero-shot learning is important. This will span a paragraph or two at most.

πŸ”Ή Applications of Zero-shot Learning

This section aims to provide some real-world applications of zero-shot learning in a computer vision context. There are no specific ones yet.

πŸ”Ή Challenges and Limitations of Zero-shot Learning

πŸ”Ή Frontiers

This is a paragraph or a little bit more mentioning the current state-of-the-art and or recent experimental approaches in zero-shot learning.

πŸ”Ή Chapter Summary

This is another paragraph (or two πŸ˜…) that aims to condense the main ideas discussed in the chapter and Key Takeaways.

πŸ‘©β€πŸ’» Hands-on Notebook

This is a hands-on notebook that shows two things:

  1. The implementation of a classic ZSL algorithm, for example, ESZSL from scratch.
  2. The implementation of a ZSL pipeline using CLIP or another friend from scratch.

Notes

  1. The chapter contents may seem overwhelming but we hope that we will get to a much shorter and dense version when we start working on the details.
  2. The Algorithms part is the most volatile (high probability of change) one. There are a lot of ZSL algorithms out there and we are trying to choose a representative sample showing different approaches.
  3. We will make sure that the ratio of plain ZSL : CV ZSL remains in the reasonable range.

Other Resources:

ATaylorAerospace commented 9 months ago

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

alperenunlu commented 9 months ago

Comprehensive content. This is just terrific.

Looking forward to this. πŸš€ πŸš€ πŸš€ πŸš€ πŸš€

mmhamdy commented 9 months ago

This looks to be a great chapter and has an incredible amount of content! :-)

On the Notebook for the chapter. ...since a generalized ZSL will be included (ESZSL) in the notebook,should you also include a more robust example for TOP-1 accuracy like SOMZSL, TGMZ or Cosmo?. That way the students can work thru a simple linear example with ESZSL but also get to work with methods that are more accurate.

Thanks, @ATaylorAerospace. We're still not sure which algorithm to use in the notebook but surely will keep that in mind.

mmhamdy commented 9 months ago

Comprehensive content. This is just terrific.

Looking forward to this. πŸš€ πŸš€ πŸš€ πŸš€ πŸš€

Thanks, @alperenunlu for taking the time to read it.

lunarflu commented 9 months ago

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

mmhamdy commented 9 months ago

Looks awesome! My guess is for a first try we want to shorten a bit + condense, and then in followups we can go more indepth (could be faster to iterate that way and not get hung up releasing everything completed)

What do you think?

Yeah, of course. This is going to get much shorter bit by bit. The plan is to grow the chapter (async) and then work on pruning. Releasing section by section will also make the process much easier.

johko commented 9 months ago

This really is an extensive outline, thanks for all the work, it looks awesome. I thought I know some stuff about ZSL, but I could not have come up with that much material :smile:

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

mmhamdy commented 9 months ago

Do you have any prioritization on which parts you want to work on first? In my opinion the Zero-shot Learning methods part can have a little less priority for now as I think it is more important for people to see some actual applications as you will do in Zero-shot Learning in Computer Vision

Yeah, we will take an inside-out approach and start from Zero-shot Learning in Computer Vision (which still needs an outline of its own, I think πŸ˜…) and then work on the rest of the sections once it is finished. The ZSL section will have its own share but we will keep it brief, kind of like introducing Q-Learning before talking about Deep Q-Learning in Reinforcement learning. The outermost chapters (Introduction, Advantages, Applications, and Frontiers) are mostly a couple of paragraphs long and won't take much of the chapter.

merveenoyan commented 9 months ago

@mmhamdy great outline! my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

mmhamdy commented 9 months ago

my only concern is that this is too broad and that you might find yourself overwhelmed in the process, so make sure to prioritize at first and we can iteratively release.

It's too broad, I agree. But a lot of the non-CV-specific content will be pruned and condensed to just provide a smooth transition from plain ZSL to CV ZSL. We are starting with the Zero-shot Learning in Computer Vision section in order not to get overwhelmed by the other sections. Once finished, we will branch out from there and start working on the rest of the sections.