Closed catid closed 2 months ago
Thank you for your interest. We believe that our codes will work with Mixtral as well with little modifications, as we only make changes to the attention module. Here are the parts that need to be checked in the code:
We plan to release the code applied to Mixtral in the near future, so you can look forward to it!
Hello!
We now support the Mistral model! (Check out the setup in the README.) In the case of Mixtral, we are not able to test it in our computing environment. You can refer to src/arch/ccm_mistral.py and make a modification for the use of Mixtral.
Could you provide some guidance on how to extend your results to the Mixtral model?