Closed mf-skjung closed 1 week ago
Do you know what architectural changes Intern did?
Do you know what architectural changes Intern did?
While I don't have direct access to the implementation details, I can provide an overview based on the publicly available technical documentation(https://arxiv.org/pdf/2403.17297):
InternLM2.5 uses an interleaved approach for the Wk, Wq, and Wv matrices, unlike the standard stacking method. This allows for more flexible tensor parallelism adjustments.
All InternLM2.5 models implement Grouped-Query Attention (GQA), which enables efficient processing of long contexts while keeping GPU memory usage low. This may differ from some other models.
InternLM2.5 consolidates the Wk, Wq, and Wv matrices, which resulted in a 5% speed increase during pre-training. This optimization may not be present in other models.
Like LLaMA, InternLM2.5 uses RMSNorm instead of LayerNorm and employs the SwiGLU activation function.
InternLM2.5 was trained on contexts up to 32k tokens long, with the ability to handle even longer contexts (up to 200k tokens) through position encoding extrapolation.
https://huggingface.co/chargoddard/internlm2-7b-llama Internlm2 can be llamafied. It may also be applicable for InternLM2.5
Oh if it can be llamafied - best to do that :)
Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied
File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained
model = model_patcher.post_patch(model)
File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch
and (module.cos_cached.dtype != correct_dtype):
File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'
Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied
File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained model = model_patcher.post_patch(model) File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch and (module.cos_cached.dtype != correct_dtype): File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'
You can temporarily fix this by disabling RoPE scaling. Edit the config.json and set the value of "rope_scaling"
to null
.
Thank you for the suggestion to use the llamafied version of InternLM2.5. I attempted to load the model as recommended, but unfortunately encountered an error. Referenced model is https://huggingface.co/Downtown-Case/internlm2_5-7b-chat-1m-llamafied
File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1221, in from_pretrained model = model_patcher.post_patch(model) File "/home/sk.jung/.local/lib/python3.10/site-packages/unsloth/models/llama.py", line 1452, in post_patch and (module.cos_cached.dtype != correct_dtype): File "/home/sk.jung/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LlamaDynamicNTKScalingRotaryEmbedding' object has no attribute 'cos_cached'
You can temporarily fix this by disabling RoPE scaling. Edit the config.json and set the value of
"rope_scaling"
tonull
.
Thank you for your guidance. The suggested fix worked perfectly. I appreciate your help in resolving this issue.
Hello unsloth team,
I'm trying to use the InternLM2.5 model (specifically internlm/internlm2_5-7b-chat) with unsloth, but I'm encountering a NotImplementedError. Could you please add support for this model?
Adding support for InternLM2.5 would be greatly appreciated as it's a powerful and efficient model that could benefit many users.
Thank you for considering this request!