Qwen1.5-72B-Chat-GPTQ-Int4

thunlp / InfLLM

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

MIT License

269 stars 21 forks source link

Closed ChuanhongLi closed 5 months ago

ChuanhongLi commented 5 months ago

请问下，能直接跑 Qwen1.5-72B-Chat-GPTQ-Int4 模型吗？

guyan364 commented 5 months ago

你好，我使用 Qwen1.5-7B-Chat-GPTQ-Int4 测试可以推理。load model 的时候去掉 dtype=torch.bfloat16 即可。

huliangbing commented 4 months ago

您好！修改哪个文件？@guyan364