CV Tasks: Segmentation, Detection, etc...

kausmeows commented 1 year ago

Hey everyone!

As discussed in the discord previously, here's a rough template and summary of what we think could be a good start for this section. Upon further discussion with the team we wanted to also get some input from the HF team before we seriously start implementing everything.

Various use-cases of CV and their details

This section will briefly touch on the areas which have massive use of computer vision like- segmenting images both semantically and instance-wise, and object detection in different scenarios.
A little bit about various architectures that are used in these cases like YOLO, ResNet, RCNN, Faster- RCNN, etc.
Intro to few-shot classification and its application.
Deeper concepts intro like Class Incremental Learning and Few shot class incremental learning (FSCIL)? not sure if we want to add this, would need some input from the maintainers @merveenoyan @lunarflu @johko

Notebooks

Since this section seems more like hands-on code, people can quickly refer to the main application and reproduce it at their end if required, so providing notebooks seems sensible here.
For starters, these notebooks can be a direct implementation of these CV tasks using torchvision, transformers. And then for more details and inner workings, I would want to show the actual implementation of let's say U-Net for Segmentation on any standard dataset. This would be more like replicating that architecture in vision scenarios. Do we want to do this?

Let us know if this sounds like a plan and we can iterate and improve upon this 🤗!
@johko @lunarflu @merveenoyan

Team members- @adhiiisetiawan @sarthak247 @vijiv11

johko commented 1 year ago

Just to name it, in my opinion the foundational CV tasks are: Classification, Detection, Segmentation These should be covered in any case. When writing about the tasks I would start by stating main problem they are solving, which sub-tasks there are (like instance segementation) and introducing the main models that are being used for it (not in detail like architecture etc, just their main attributes and what makes them stand out).

If you want to cover stuff like few-shot classification that is nice, but remember that we also have a dedicated section about zero shot computer vision tasks.

Class Incremental Learning sounds more like a bonus to me and maybe even better fitting into the Computer Vision In The Wild section. But for now I would leave it out.

kausmeows commented 1 year ago

@johko okay that makes sense. What about the notebook sections?

merveenoyan commented 1 year ago

@kaustubh-s1 You can see the main tasks here: https://huggingface.co/tasks, besides this, there's image similarity search. Some tasks are too comprehensive (e.g. Image Segmentation covers panoptic, instance and semantic segmentation). I think these are good ideas:

Intro to few-shot classification and its application. Deeper concepts intro like Class Incremental Learning and Few shot class incremental learning (FSCIL)? not > sure if we want to add this

As Johannes said, we have a dedicated section for foundational models, so you can do a very brief introduction. On CIL I agree with Johannes as well.

kausmeows commented 1 year ago

Thank you for getting back on this @merveenoyan. I got the point on how to structure things. cc @adhiiisetiawan @sarthak247 @vijiv11

johko commented 1 year ago

@kaustubh-s1 I think notebooks definitely make sense to have here especially when you show people how to do these tasks with tranformers, torchvision, maybe timm. I would be cautious when it comes to too detailed implementations like the U-Net approach you mention. That might be interesting for a few people, but I feel like 90% of the people would not have a look. Then again if you have the time and really want to do it, feel free to do it. In the new structure the chapters with mdx files are more in focus and having an additional notebook in the notebooks folder would not break the learning experience of the people, I guess

kausmeows commented 1 year ago

Cool @johko, in the team discussion we were thinking it to be more practical implementation in the start (as you said mostly people would be looking out for that) but in the very last we can have an additional reference section for the detailed from scratch implementation in Pytorch let's say. Just for those who might want to go deeper.

Can totally skip this in the first version of the PR. And if you guys think it might make a nice extra addition just let us know along the way, we can arrange for that 🤗

lunarflu commented 1 year ago

Cool @johko, in the team discussion we were thinking it to be more practical implementation in the start (as you said mostly people would be looking out for that) but in the very last we can have an additional reference section for the detailed from scratch implementation in Pytorch let's say. Just for those who might want to go deeper.

Can totally skip this in the first version of the PR. And if you guys think it might make a nice extra addition just let us know along the way, we can arrange for that 🤗

Yep, I would definitely suggest splitting into multiple PRs, keeping things atomic, changing + improving one thing at a time. It's like a tree, you need the trunk to grow first before the flowers bloom on the branches, right? (something like that 😉 )

kausmeows commented 1 year ago

Nice poetic analogy there @lunarflu 🕵️🚀

bastienpo commented 1 year ago

Hi everyone!

Here is the refined and detailed outline for this section that we've worked on with the team. Please feel free to provide us with some feedback.

Computer vision Tasks Outline

Image Classification

Overview of Image Classification
- Introduction to the concept and importance of image classification in computer vision
- Brief presentation of popular model (link to Vision Transformer section)
Example Application of Image Classification
Evaluation Metrics for classification (Multiclass, multilabel and unbalanced data)
- Accuracy, Precision Recall, F1 score, Confusion matrix, ROC curve, AUC, MCC, Balanced accuracy
Hands-on Notebook
- Using pretrained Resnet with hugging dataset for image classification problem

Object Detection

Overview of Object Detection
- Introduction to object detection and its significance in computer vision.
- Classification plus localization
  
  Brief presentation of popular model for object detection --> Detr
Families of object detections approach
1. Single Stage: Presentation of YOLO
2. Two Stage: Presentation of Faster-RCNN
Example Application in Object Detection
Evaluation Metrics for Object detection
- IoU (Intersection over Union) and mAP (Mean Average Precision)
Hands-on Notebook
- Using YOLO for Object Detection with ultralyticsplus

Image Segmentation

Overview of Image Segmentation
- Explaining the concept
- Brief presentation of popular segmentation model --> segment anything model (SAM)
Type of segmentations
1. Semantic Segmentation
2. Instance Segmentation
3. Panoptic Segmentation
Example Application in Object Detection
- Applications in Autonomous Vehicles (AV), medical imaging, robotics, etc.
Evaluation Metrics for image segmentation
- IoU, mIoU, pixel accuracy, jaccard index, dice coefficient
Hands-on Notebook
- Using Mask-RCNN for Image Segmentation (Implementation in Pytorch)

merveenoyan commented 1 year ago

Hello @Skower! I think overall it looks good. Maybe we could have less ConvNet based models (given the amount of the resources out there) and have more transformer based models. WDYT?

kausmeows commented 1 year ago

That makes sense @merveenoyan We can do that 🙌

johko commented 1 year ago

I agree with Merve - looks like super cool content.

Regarding the demo notebooks, you can also feel free to orientate on the models availabe in transformers, the docs already give pretty good examples.

But if you'd rather go with things like YOLO that is also fine with me :slightly_smiling_face:

johko / computer-vision-course