johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
389 stars 126 forks source link

Common Pre-Trained Models (ResNet, etc.) Discussion #39

Closed sezan92 closed 3 months ago

sezan92 commented 9 months ago

Hello. This issue is for discussion about the chapter Common Pre-Trained Models

My thoughts

In the following paragraphs, I am adding my thoughts. This will be finalized after discussion.

What do you think should be added?

The chapter assumes the reader nows fairly about the CNN algorithms. Now are job is to address the major architectures using CNN.

Architectures to be added

How would you like to explain?

Please let me know your thoughts

ATaylorAerospace commented 9 months ago

Hello! If the different layers of CNN's are covered in an earlier chapter, I think a short refresher review of Convolutional, Pooling and Fully connected layers might be a great start to this chapter before the deep dive into the various CNN architectures.

ash-01xor commented 9 months ago

Hey @ATaylorAerospace ,

If the different layers of CNN's are covered in an earlier chapter, I think a short refresher review of Convolutional, Pooling and Fully connected layers might be a great start to this chapter before the deep dive into the various CNN architectures.

My team is dealing with the general architechture and we would be taking care of all these details , we are yet to finalize on the final content as of now , after which we will release the issue for comments.

This chapter in specific by @sezan92 and team deals with only common pre-trained models .

Thanks

johko commented 9 months ago

Thank you for the outline @sezan92 :slightly_smiling_face:

Architectures to be added Lenet ( first CNN) Alexnet (First one to use GPU) Vgg16/19 (First deep CNN) Resnet (Residual net helped ) Google (inception model) Mobilenet (first optimized for mobile devices)

I like the historical approach, starting from LeNet and then going further. But for my liking it still fells a bit "too old". I like reflecting on where we come from, but I think the people might also want to know what "modern" CNNs are there, because it is not all about ViTs these days. For this I think you should also add ConvNext (https://github.com/facebookresearch/ConvNeXt) as it is a good representative of current SOTA CNNs

I think the chapter should be diagram-heavy and if possible some implementations of the architectures.

Makes sense for architectures, even nicer would be animations, but I don't know how well that works within .mdx files.

sezan92 commented 9 months ago

@johko thank you for your reply, I am adding Convnext then . I thought SOTA architectures would be added in the last chapter (modern architectures)

I have a question, how do you think we should proceed about hands-on approach?

salgadev commented 9 months ago

+1 for the diagrams. I would suggest making these lessons more practical and keeping only the top maybe 5 most popular CNNs as per their usage in HF with small demos for each, explaining their common use cases. Then they could segway into the transfer learning/fine tuning lessons.

johko commented 9 months ago

I thought SOTA architectures would be added in the last chapter (modern architectures)

The last chapter is more about experimental and really new architectures. Everything currently SOTA can be covered in the other chapters.

And I agree with @socd06 that it makes ense to focus on maybe 5 very popular CNNs.

I mentioned that I liked the historical approach but just now realized that it might overlap with the general CNN architecture chapter a bit. Which can be good, as long as it is not too much. I think they don't have an outline yet, but maybe @alperenunlu or someone else from the team can already give some info.

alperenunlu commented 9 months ago

I think they don't have an outline yet, but maybe @alperenunlu or someone else from the team can already give some info.

@johko I recently added myself in to the first chapters contributors. I will look into this.

alperenunlu commented 9 months ago

Architectures to be added

  • Lenet ( first CNN)
  • Alexnet (First one to use GPU)
  • Vgg16/19 (First deep CNN)
  • Resnet (Residual net helped )
  • Google (inception model)
  • Mobilenet (first optimized for mobile devices)

By the way some things and mistakes about this list.

LeNet-5 would be more suitable (of course with different last layers)

Alexnet is not the first to use GPU. It's importance is more complex but showed that CNN's can be powerful by winning imagenet.

The name of the architecture is not Google it's GoogLeNet.

sezan92 commented 9 months ago

Architectures to be added

  • Lenet ( first CNN)
  • Alexnet (First one to use GPU)
  • Vgg16/19 (First deep CNN)
  • Resnet (Residual net helped )
  • Google (inception model)
  • Mobilenet (first optimized for mobile devices)

By the way some things and mistakes about this list.

LeNet-5 would be more suitable (of course with different last layers)

Alexnet is not the first to use GPU. It's importance is more complex but showed that CNN's can be powerful by winning imagenet.

The name of the architecture is not Google it's GoogLeNet.

@alperenunlu thank you. i will correct it. but i thought Alexnet was first to use GPU https://sebastianraschka.com/faq/docs/first-cnn-gpu.html

anyway, that is not the main point.

merveenoyan commented 9 months ago

Hello 👋 I agree with @johko on not including old ones. Except for that it sounds good.

ShamieCC commented 9 months ago

Hello I just joined the team today, we could also show code implementations of these models to show practical use cases?

sezan92 commented 9 months ago

i have a question for @johko @merveenoyan , what about the implementations? As it is my first time, do we need to implement from scratch (using some framework), or do we show some use cases from model zoos?

I prefer to implement it from scratch. I need to know others' opinions.

sezan92 commented 9 months ago

Hello I just joined the team today, we could also show code implementations of these models to show practical use cases?

i have some confusion regarding the implementation

sezan92 commented 9 months ago

Hello I just joined the team today, we could also show code implementations of these models to show practical use cases.

@ShamieCC I have asked pinged you in Discord server. please check

sezan92 commented 9 months ago

I think the best would be

Each of the architecture is kind of new milestone for CNN.

what do you guys think? @johko @merveenoyan @alperenunlu

johko commented 9 months ago

Sorry for the late reply @sezan92 , I kinda missed it. I think the proposal is good to start with.

Mkrolick commented 7 months ago

Hey 👋, just wanted to add a follow up question if anyone is working on explaining MobileNet. As that a key component of that paper (MobileNetv2 blocks), is used for an architecture in the common vision transformers section that I am writing.

sezan92 commented 7 months ago

@Mkrolick. Interesting point. we can add Mobilenet. But I think after this sprint.

Mkrolick commented 7 months ago

@sezan92 Sorry for not responding back! That sounds great. I can also add it in after the sprint if you'd like.