Question about RAM - Githubissues

xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.

https://recognize-anything.github.io/

Apache License 2.0

2.8k stars 275 forks source link

Question about RAM #135

Open Davidyao99 opened 10 months ago

Davidyao99 commented 10 months ago

Hi, great project!

I am a little bit confused by the architecture. May I ask why do we need the generation branch if we are only interested in the image tagging? Cant we perform the same training and obtain a similar recognize-anything model by removing the entire generation branch?

Thank you!!

xinyu1205 commented 10 months ago

Since we find that the multi-task paradigm with image-text generation can boost image tagging. But this improvement is indeed minor.

Therefore, in the latest version of RAM++ model, we adopt image-text alignment with image tagging.