Open Orion-zhen opened 3 weeks ago
Yeah, the architecture isn't supported. There's a bunch of little things that would have to be updated, like how the EOS token is a list all of a sudden, scaled attention layers and such. It's not high on the list of priorities at the moment. Not sure if the model is any good, or if it's any good without the multimodal capabilities which wouldn't be supported anyway.
As the THUDM declares, the glm4-9b could out-perform llama3-8b. So it might worth a try. BTW, would it be possible to add multimodal support to exllamav2 in the future? I see that multimodal llm (vlm) could be the next trend.
Multimodal is possible, of course, as is GLM4 in general, along with diffusion models, TTS, you name it. I just have to prioritize. But contributions are always welcome.
As I run the
convert.py
with command:CUDA_VISIBLE_DEVICES=1 python convert.py -i /home/orion/ai/Models/glm4-9b -o ./tmp-file -cf /home/orion/ai/Models/glm4-9b-4-exl2 -r 256
, it runs into an error sayingTypeError: Value for eos_token_id is not of expected type <class 'int'>
.It seems that the architecture of glm4 hasn't been supported yet.
Steps to reproduce: Just download the glm4-9b model and run the
convert.py
as README says.Full console log: