mu-cai / matryoshka-mm

Matryoshka Multimodal Models
https://matryoshka-mm.github.io/
Apache License 2.0
72 stars 4 forks source link

[Discussion] Section 3 (of the paper) should have pseudocode #1

Open dinhanhx opened 4 months ago

dinhanhx commented 4 months ago

Discussion

I know the paper is being reviewed and will likely be modified. However, I think some sort of pseudocode would be nice. Few chunks of paragraphs make things a bit hard to follow. The pseudocode also would help other people implement this technique onto their current models.

mu-cai commented 4 months ago

Thanks for your advice! The core of M3 is here https://github.com/mu-cai/matryoshka-mm/blob/main/llava/model/llava_arch.py#L147

Let me know if you have further questions!

dinhanhx commented 4 months ago

https://github.com/mu-cai/matryoshka-mm/blob/8ca825dd73d8a6d144574a541955ccd5640b6d9a/llava/model/llava_arch.py#L147-L157

What is the value range of matryoshka_vis_token_scale? From 1 to infinity? Or 0.0 to 1.0?

mu-cai commented 4 months ago

Hi, the range is shown here: https://github.com/mu-cai/matryoshka-mm/blob/main/scripts/v1_5/finetune.sh#L36