unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.52k stars 1.04k forks source link

unsloth-internLM 2.5 #767

Open rezzie-rich opened 2 months ago

rezzie-rich commented 2 months ago

can we please get official support for internLM-2.5?

I have seen a closed issue regarding that #734. however, the model mentioned there might be broken as it fails to load for instance.

It would be great to get an official version from you guys since the model has a lot of potential due to its size and context window.

additional question: does llamafing a model pose any licensing restriction from llama? if so it would be hugely appreciated if the supported internLM is not restricted by any llama licensing agreement.

danielhanchen commented 1 month ago

Llamafying it won't cause license issues since it's just a re-arrangement of modules. I'll try but best to llama-fy it for now

rezzie-rich commented 1 month ago

thank you, looking forward to it. if possible then the 1M context version :D if not 200k will work too.

their benchmark shows it performs best up to 200k context window before losing some quality.

rezzie-rich commented 1 month ago

will it be too much to ask to request for a commercial usage license for internLM from the creator for the unsloth version? they offer it for free upon request. If you guys obtain it then it becomes easier for anyone using the unsloth version of internLM without needing to request that again.

danielhanchen commented 1 month ago

Apologies for the delay - hmm I think its the engineer themselves (ie yourself) who has to request it - we can request it for our own use, but unsure on distributing it through ourselves

rezzie-rich commented 1 month ago

we can request it for our own use, but unsure on distributing it through ourselves

Maybe that can be confirmed during the request since llamafying it will make it a different model, architecturally.

ethanc8 commented 1 month ago

I have llamafied InternLM2.5-7B, and tried to open it in Unsloth.

I get

/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py in LlamaAttention__init__(self, config, layer_idx)

ValueError: Unknown RoPE scaling type dynamic
ethanc8 commented 1 month ago

This model has

  "rope_scaling": {
    "factor": 2.0,
    "type": "dynamic"
  },

in its config.json

ethanc8 commented 1 month ago

Open LLM Leaderboard also seems to be having trouble with its dynamic rope_scaling: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/862

ethanc8 commented 1 month ago

In the other closed issue, you mentioned that RoPE scaling can be disabled in order to finetune the model. I will try that.

danielhanchen commented 1 month ago

Wait what's "dynamic" RoPE scaling - the only accepted ones are linear rope scaling, NTK, YARN, llama-3 type etc

ethanc8 commented 1 month ago

I actually have no idea, will probably need to read the internlm remote code: https://huggingface.co/internlm/internlm2_5-7b/blob/main/modeling_internlm2.py

rezzie-rich commented 1 month ago

good news: if unsloth makes an internLM version ( both bf16 and int4 ), it can be released under the Apache-2.0 license since the original model comes under the Apache-2.0 license for research purposes. Anyone using the unsloth version will then only be bound by that model's license since they aren't using the original model.

It would be highly appreciated if a 200k context window version of the model is released by unsloth. This model has the best needle benchmark score compared to any other.

ethanc8 commented 1 month ago

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

rezzie-rich commented 1 month ago

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

it's from the model card: Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.

ethanc8 commented 1 month ago

The original model is nor Apache-2.0 for research purposes, only the inference code is. However, models are probably not copyrightable in the US. The best way to get it licensed under Apache-2.0 is to ask for a license.

it's from the model card: Open Source License The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact internlm@pjlab.org.cn.

That explicitly says that only the code is under Apache-2.0. The model weights are available under an unspecified license which prohibits commercial use, and you can get a commercial license by applying with the application form.

rezzie-rich commented 1 month ago

It clearly states that the license requirement is only for commercial use. Otherwise, it's open under Apache

ethanc8 commented 1 month ago

It clearly says that only the code is licensed under Apache-2.0. Anyways, it would be best to contact them, as they have not revealed the details of their public license.