unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.74k stars 832 forks source link

Add support for InternLM2.5 model #734

Closed mf-skjung closed 1 week ago

mf-skjung commented 1 week ago

Hello unsloth team,

I'm trying to use the InternLM2.5 model (specifically internlm/internlm2_5-7b-chat) with unsloth, but I'm encountering a NotImplementedError. Could you please add support for this model?

Adding support for InternLM2.5 would be greatly appreciated as it's a powerful and efficient model that could benefit many users.

Thank you for considering this request!

danielhanchen commented 1 week ago

Do you know what architectural changes Intern did?

mf-skjung commented 1 week ago

Do you know what architectural changes Intern did?

While I don't have direct access to the implementation details, I can provide an overview based on the publicly available technical documentation(https://arxiv.org/pdf/2403.17297):

image

  1. InternLM2.5 uses an interleaved approach for the Wk, Wq, and Wv matrices, unlike the standard stacking method. This allows for more flexible tensor parallelism adjustments.

  2. All InternLM2.5 models implement Grouped-Query Attention (GQA), which enables efficient processing of long contexts while keeping GPU memory usage low. This may differ from some other models.

  3. InternLM2.5 consolidates the Wk, Wq, and Wv matrices, which resulted in a 5% speed increase during pre-training. This optimization may not be present in other models.

  4. Like LLaMA, InternLM2.5 uses RMSNorm instead of LayerNorm and employs the SwiGLU activation function.

  5. InternLM2.5 was trained on contexts up to 32k tokens long, with the ability to handle even longer contexts (up to 200k tokens) through position encoding extrapolation.

hiyouga commented 1 week ago

https://huggingface.co/chargoddard/internlm2-7b-llama Internlm2 can be llamafied. It may also be applicable for InternLM2.5

danielhanchen commented 1 week ago

Oh if it can be llamafied - best to do that :)

mf-skjung commented 1 week ago

Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied

  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained
    model = model_patcher.post_patch(model)
  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch
    and (module.cos_cached.dtype != correct_dtype):
  File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'
rwl4 commented 1 week ago

Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied

  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained
    model = model_patcher.post_patch(model)
  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch
    and (module.cos_cached.dtype != correct_dtype):
  File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'

You can temporarily fix this by disabling RoPE scaling. Edit the config.json and set the value of "rope_scaling" to null.

mf-skjung commented 1 week ago

Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied

  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained
    model = model_patcher.post_patch(model)
  File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch
    and (module.cos_cached.dtype != correct_dtype):
  File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'

You can temporarily fix this by disabling RoPE scaling. Edit the config.json and set the value of "rope_scaling" to null.

Thank you for your guidance. The suggested fix worked perfectly. I appreciate your help in resolving this issue.