johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
376 stars 124 forks source link

Unit 3 Chapter 3 contents: Transfer Learning and Fine-Tuning Vision Transformers #53

Closed Anindyadeep closed 2 months ago

Anindyadeep commented 8 months ago

Hey, sorry for the delayed issue. With our fellow collaborators (@alanahmet, @sezan92), we managed to get out curriculum out for the Transfer Learning and Fine-tuning with Vision Transformers :hugs:

Chapter Layout

Let's discuss more on this and re-iterate on the chapter content if needed.

merveenoyan commented 8 months ago

Hello 👋 I think this is very nice.

Transfer Learning vs Fine-tuning.

Isn't fine-tuning technically a subsection of transfer learning? Maybe consider framing it differently.

Removing the last layers and adding a MLP + additionally re-train a small percentage of the layers

Also we remove the classifier head and add our own head and unfreeze last layers that are responsible of feature extraction and retrain, is this what you mean here?

When to use Fine-tuning and Transfer Learning

I feel like unless we're working on tabular data it's almost always beneficial given it reduces amount of data we need to use and increases performance.

Maybe one thing you can add is knowledge distillation. I've recently experimented around it to add it to transformers docs (will be released today) and observed a good amount of improvement, especially now that we have foundational models that are very good yet hard to fine-tune, they'd be useful for KD. Overall, good compilation!

Anindyadeep commented 8 months ago

Thanks @merveenoyan for the awesome feedback.

Isn't fine-tuning technically a subsection of transfer learning? Maybe consider framing it differently.

Well, yes that's right. However, there are nuances. And I have seen lot of folks get confused and debate about this topic. Here and Here. So, in that subsection I want to give a clear idea on how each of them works. Let me know if that sounds reasonable and also feel free to suggest some better sub section titles :)

Also we remove the classifier head and add our own head and unfreeze last layers that are responsible of feature extraction and retrain, is this what you mean here?

Yes, I wanted to write that in short :sweat_smile:, may be, need to find some better titles for subsections.

I feel like unless we're working on tabular data it's almost always beneficial given it reduces amount of data we need to use and increases performance.

Yes, so, should we go and include this subsection or like remove it. You can also think of as section that includes the best practices.

Maybe one thing you can add is knowledge distillation.

Yes, that would be awesome, but should that be included under fine-tuning and transfer learning, or should I create a different issue for that? Because by definition they are dis-similar?

Thanks and let me know.

johko commented 8 months ago

Hey, thanks for the outline :hugs:

Here are my thoughts:

  • A small introduction to Knowledge Transfer. Why training vision models from scratch are not always the solution.
  • Transfer Learning vs Fine-tuning.

These are very fundamentals parts and it is good to cover them. But as we will already have a Transfer Learning/Fine-Tuning section in the CNN chapter, before this one, make sure to check with the team that is working on it (they don't have submitted an outline yet) to avoid too much overlap. I'd say focus on the differences when it comes to Vision Transformers and you can also give a lot of room to the transformers (library) specific methods.

  • Transfer Learning in depth with 🤗 transformers / torch code.

    • Removing the last layer and adding a MLP
    • Removing the last layers and adding a MLP + additionally re-train a small percentage of the layers
  • Fine Tuning in depth with 🤗 transformers / torch code

    • General Fine-tuning (i.e. take a small transformer and full fine-tune it)
    • Fine-tuning using PEFT (example: LoRA). This should include a small intro on Parameter Efficient Techniques. I really like that you are using transformers and PEFT here, especially the PEFT methods are something that I hardly see covered in courses yet. For the MLP part I would says the same as above - try to avoid overlap, best focus on transformer specific attributes, maybe point out some differences with CNNs, that would make a good connection to the former chapter.

In general it would be great to have some task specific fine-tuning examples, which you can connect with your overall outline. Give an example for simple classification at some point, at another use a segmentation task, etc.

I think it is important to not become too theoretical here, but give the participants something they can run themselves and get great results. Afterall that is the magic of fine-tuning - get good results with relatively little effort.

But I'm sure you've got this and it will be an awesome chapter :+1: :slightly_smiling_face:

Anindyadeep commented 7 months ago

Hey, so quite a while, and addressing all the feedback, we have come up with this structure. We started to work accordingly. Let us know this looks right or not. We will also add the PRs that are open/merged in our and the main repo.

Contents

Contribution Guidelines

We are going to work on Shreyas's repository. And here is how we are going to work.

.
└── course/
    └── chapters/
        ├── chapter3/
        │   ├── Transfer Learning and Fine-tuning.mdx
        │   └── TransferLearningInDepth.ipynb
        ├── chapter-n/
        │   └── ...mdx
        └── _toctree.yml

Please assign yourself. So, by the end of next week, I would appreciate some response. So, that we can provide this updated chapter contents to the admin teams, and they get informed on the specifics.

CC: @shreydan @sezan92 @alanahmet CC: @merveenoyan @johko

matthiasdroth commented 7 months ago

Hi there, I am "contributor 4" on Chapter 3 items

I would like to contribute by

  1. taking one task of the set (classification, object detection, segmentation) and train a vision transformer on it,
  2. do almost the same again but with a bigger model and training it using PEFT LoRA (plus showing how to save the adapter locally and how to load it again), and
  3. add a demo on prompt tuning.

Is everybody OK with that?

@shreydan @sezan92 @alanahmet CC: @merveenoyan @johko

shreydan commented 7 months ago

@matthiasdroth hello can you please reach out to us on discord. All of the discussions are going on there regarding contributions, mention us in the official discord channel: #cv-community-project. Thank you.

matthiasdroth commented 7 months ago

What are your handles on Discord? It seems only @sezan92 is the same as on GitHub.

Anindyadeep commented 7 months ago

What are your handles on Discord? It seems only @sezan92 is the same as on GitHub.

Uhm, it would be better, if we take this discussion on public discord cv-community course channel.

matthiasdroth commented 7 months ago

Uhm, it would be better, if we take this discussion on public discord cv-community course channel.

Sure. But how is this supposed to work without having your handles? Feel free to answer me on Discord via @matthiasdroth.