Closed brucethemoose closed 3 weeks ago
There are llamafied versions, they do seem to work as is with ExllamaV2. https://huggingface.co/chargoddard/internlm2-20b-llama
Does anyone know what the llamafication entails? It looks like the models just need tensors renamed and the QKV tensor split into separate Q, K and V tensors. But the safetensors files are substantially larger.
Are you sure you're looking at the right files? I've compared the original and llamafied models, and llamafied safetensors in total are about 60KB smaller than the original bin files.
This has kinda dropped of my radar, but one thing the custom code implements is its own rope scaling.
It uses custom modeling/tokenizer code like Yi used to:
https://huggingface.co/internlm/internlm2-chat-20b
It may or may not already work, or work with a simple repacking hack to "llamafy" it? Consider this a WIP tracking issue, I am downloading it to test right now.