snu-mllab / Context-Memory

Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
https://arxiv.org/abs/2312.03414
MIT License
47 stars 1 forks source link

Supporting Mixtral model #2

Closed catid closed 2 months ago

catid commented 6 months ago

Could you provide some guidance on how to extend your results to the Mixtral model?

Janghyun1230 commented 6 months ago

Thank you for your interest. We believe that our codes will work with Mixtral as well with little modifications, as we only make changes to the attention module. Here are the parts that need to be checked in the code:

  1. src/model.py
    • In this file, we attach LoRA to the model and expand the embedding. Modifications may be needed to adapt this part to the Mixtral model.
  2. src/arch/ccm_llama.py
    • Memory components need to be added to the model function. Specifically, you need to add compression parts and adjust positional encoding appropriately. We have marked the changed parts in the file.

We plan to release the code applied to Mixtral in the near future, so you can look forward to it!

Janghyun1230 commented 4 months ago

Hello!

We now support the Mistral model! (Check out the setup in the README.) In the case of Mixtral, we are not able to test it in our computing environment. You can refer to src/arch/ccm_mistral.py and make a modification for the use of Mixtral.