Open tommarques56 opened 1 month ago
My guess would be that the architecture might be different enough that this code would not work https://github.com/sshh12/multi_token/blob/main/multi_token/language_models/mistral.py. Potentially can duplicate that file and add an instance for Mixtral based on the huggingface implementation.
Do you think it’s the same / better way to fine tune directly Mixtral, or use mistral 7b fine tuned for vision, create a MoE with tools like mergoo ( please take a look at mergoo because SSHH12 + mergoo can be a life changer ) and fine tune the created MoE ?
Hm my guess would be that merging after training the modality projector wouldn't work (at least out of the box with this library just bc of all the custom torch modules that get strapped onto the model). However should definitely be do-able to take an existing merge and add the modality to it by adding that hf architecture as I mentioned.
Hi, I just want to know if somebody have successfully trained a Mixtral like the 7x8B ? Because when I try, the output is random ( unreadable ).
thank !