sihyun-yu / REPA

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
https://sihyun.me/REPA
MIT License
663 stars 30 forks source link

Exploring alignment with multiple models? #8

Closed elicassion closed 1 month ago

elicassion commented 1 month ago

Hi @sihyun-yu ,

Great work!

I've also been working on taking advantage of pre-trained visual representations to supervise a model for a different task. Here is our work Theia to improve visual representation for robot learning. I also want to mention NVIDIA's work RADIO, which solves vision problems.

Both approaches use multiple teacher models and we found this improves the representation a lot. I also noticed that you have implemented the multi-teacher training in your codebase. Did you explore this aspect and would you mind sharing your observations if you had?

sihyun-yu commented 1 month ago

Hi, thanks for your interest! We have tried in the early experiments but unfortunately we did not get more improvements --- but they were very initial experiments so I would not be sure if it is not beneficial in our case. Exploring this direction will be an interesting direction! And thanks for introducing some works; I will take a look.