thunlp / InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
MIT License
269 stars 21 forks source link

Qwen1.5-72B-Chat-GPTQ-Int4 #16

Closed ChuanhongLi closed 5 months ago

ChuanhongLi commented 5 months ago

请问下,能直接跑 Qwen1.5-72B-Chat-GPTQ-Int4 模型吗?

guyan364 commented 5 months ago

你好,我使用 Qwen1.5-7B-Chat-GPTQ-Int4 测试可以推理。load model 的时候去掉 dtype=torch.bfloat16 即可。

huliangbing commented 4 months ago

您好!修改哪个文件?@guyan364