johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
489 stars 153 forks source link

CV Tasks: Segmentation, Detection, etc... #31

Closed kausmeows closed 7 months ago

kausmeows commented 1 year ago

Hey everyone!

As discussed in the discord previously, here's a rough template and summary of what we think could be a good start for this section. Upon further discussion with the team we wanted to also get some input from the HF team before we seriously start implementing everything.

Various use-cases of CV and their details

Notebooks

Let us know if this sounds like a plan and we can iterate and improve upon this 🤗!
@johko @lunarflu @merveenoyan

Team members- @adhiiisetiawan @sarthak247 @vijiv11

johko commented 1 year ago

Just to name it, in my opinion the foundational CV tasks are: Classification, Detection, Segmentation These should be covered in any case. When writing about the tasks I would start by stating main problem they are solving, which sub-tasks there are (like instance segementation) and introducing the main models that are being used for it (not in detail like architecture etc, just their main attributes and what makes them stand out).

If you want to cover stuff like few-shot classification that is nice, but remember that we also have a dedicated section about zero shot computer vision tasks.

Class Incremental Learning sounds more like a bonus to me and maybe even better fitting into the Computer Vision In The Wild section. But for now I would leave it out.

kausmeows commented 1 year ago

@johko okay that makes sense. What about the notebook sections?

merveenoyan commented 1 year ago

@kaustubh-s1 You can see the main tasks here: https://huggingface.co/tasks, besides this, there's image similarity search. Some tasks are too comprehensive (e.g. Image Segmentation covers panoptic, instance and semantic segmentation). I think these are good ideas:

Intro to few-shot classification and its application. Deeper concepts intro like Class Incremental Learning and Few shot class incremental learning (FSCIL)? not > sure if we want to add this

As Johannes said, we have a dedicated section for foundational models, so you can do a very brief introduction. On CIL I agree with Johannes as well.

kausmeows commented 1 year ago

Thank you for getting back on this @merveenoyan. I got the point on how to structure things. cc @adhiiisetiawan @sarthak247 @vijiv11

johko commented 1 year ago

@kaustubh-s1 I think notebooks definitely make sense to have here especially when you show people how to do these tasks with tranformers, torchvision, maybe timm. I would be cautious when it comes to too detailed implementations like the U-Net approach you mention. That might be interesting for a few people, but I feel like 90% of the people would not have a look. Then again if you have the time and really want to do it, feel free to do it. In the new structure the chapters with mdx files are more in focus and having an additional notebook in the notebooks folder would not break the learning experience of the people, I guess

kausmeows commented 1 year ago

Cool @johko, in the team discussion we were thinking it to be more practical implementation in the start (as you said mostly people would be looking out for that) but in the very last we can have an additional reference section for the detailed from scratch implementation in Pytorch let's say. Just for those who might want to go deeper.

Can totally skip this in the first version of the PR. And if you guys think it might make a nice extra addition just let us know along the way, we can arrange for that 🤗

lunarflu commented 1 year ago

Cool @johko, in the team discussion we were thinking it to be more practical implementation in the start (as you said mostly people would be looking out for that) but in the very last we can have an additional reference section for the detailed from scratch implementation in Pytorch let's say. Just for those who might want to go deeper.

Can totally skip this in the first version of the PR. And if you guys think it might make a nice extra addition just let us know along the way, we can arrange for that 🤗

Yep, I would definitely suggest splitting into multiple PRs, keeping things atomic, changing + improving one thing at a time. It's like a tree, you need the trunk to grow first before the flowers bloom on the branches, right? (something like that 😉 )

kausmeows commented 1 year ago

Nice poetic analogy there @lunarflu 🕵️🚀

bastienpo commented 1 year ago

Hi everyone!

Here is the refined and detailed outline for this section that we've worked on with the team. Please feel free to provide us with some feedback.

Computer vision Tasks Outline

Image Classification

Object Detection

Image Segmentation

merveenoyan commented 1 year ago

Hello @Skower! I think overall it looks good. Maybe we could have less ConvNet based models (given the amount of the resources out there) and have more transformer based models. WDYT?

kausmeows commented 1 year ago

That makes sense @merveenoyan We can do that 🙌

johko commented 1 year ago

I agree with Merve - looks like super cool content.

Regarding the demo notebooks, you can also feel free to orientate on the models availabe in transformers, the docs already give pretty good examples.

But if you'd rather go with things like YOLO that is also fine with me :slightly_smiling_face: