Closed cfalholt closed 2 months ago
Hey @cfalholt,
thank you for the draft outline :hugs:
1. Introduction
Keep the part about different fine-tuning methods really short, as the main description should be in the first transfer learning chapter of the course (on CNNs). It is good to have a reminder here for the people, but you don't need to be exhaustive on that.
I like the idea of introducing a task here already as an example and you can also can connect that to the dataset loading and inspecting.
- Zero-shot multi-modality
Good point and quite special for the multimodal models. We will also have a dedicated chapter on that, https://github.com/johko/computer-vision-course/issues/43, but I also see it as an important point here. You can focus on being really hands-on and give a lot of examples.
- Full finetuning
You question already states the biggest issue here - finetuning big multimodal models might be rather costly. But I would still not skip this section completely, maybe don't use a multimodal with an LLM connection as an example here, but rather something like LayoutLM or OWL-ViT, as they are potentially easier to train (I guess)
4. Parameter efficient fine tuning (PEFT)
Again, try to sync with the other Fine-Tuning teams (especially https://github.com/johko/computer-vision-course/issues/53) to avoid overlap, but apart from that the section sounds good
5. Final remarks :+1:
In general I think the Transfer Learning chapter (as the other ones before that) should mainly focus on practical applications and not so much on theory. A bit like the jupyter noteboooks from @NielsRogge here: https://github.com/NielsRogge/Transformers-Tutorials
Hope that helps :slightly_smiling_face:
Hello @johko! Thank you very much for your comments!
Keep the part about different fine-tuning methods really short, as the main description should be in the first transfer learning chapter of the course (on CNNs).
Yeah, we will try to do this as a reminder for people who are only interested in this part of the course
We will also have a dedicated chapter on that, #43, but I also see it as an important point here. You can focus on being really hands-on and give a lot of examples.
I completely agree. We will need to make sure that our examples do not overlap with #43. However, judging by their outline, this should not be the case.
You question already states the biggest issue here - finetuning big multimodal models might be rather costly. But I would still not skip this section completely, maybe don't use a multimodal with an LLM connection as an example here, but rather something like LayoutLM or OWL-ViT, as they are potentially easier to train (I guess)
A difficult question, on the one hand, full-fine tuning is the simplest process of training a model for a specific task and this is perfect for educational purposes, but on the other hand, especially in practice, when training multimodals models, it is more effective to use PEFT.
What do you think about this? Should we add an example of full-fine tuning?
Again, try to sync with the other Fine-Tuning teams (especially #53) to avoid overlap, but apart from that the section sounds good
I think we will inevitably encounter overlaps because there are not too many transfer learning methods that are specialized for specific types of models. Probably, the theoretical part in our chapters will be quite similar, but the practical parts will be different due to differences in the models
In general I think the Transfer Learning chapter (as the other ones before that) should mainly focus on practical applications and not so much on theory.
Absolutely agree!
I have also prepared a list of tasks and models that we can cover in this chapter:
Can you please write feedback, maybe there is something not worth considering, or vice versa, it would be great to add.
Hey @minemile
sounds very good overall :slightly_smiling_face:
Depending the full fine-tuning it really depends. If you can find a model and dataset that is cheap to run and good for educational purposes this would be a great part and a nice feeling of success for the participants. If you don't find anything good, don't try to hard. Cover the theory of full fine-tuning, but mention the difficulties and obstacles and maybe use that as an introduction why thinks like PEFT are so helpful now.
I also like the tasks and connected models, it feels like a good and expressive variety :hugs:
Hi CV course contributors, We would love to hear your feedback on the multimodal transfer learning section of the course. Here's the current general outline with some of the thoughts we've done in the team. What do you think of the following outline:
1. Introduction
2. Zero-shot multi-modality
3. Full finetuning
4. Parameter efficient fine tuning (PEFT)
5. Final remarks