li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4
MIT License
2.81k stars 327 forks source link

推理输入不同文字长度有时会出现 check failed (std::isfinite(next_token_logits[i])) nan/inf encountered at lm_logits[0] #333

Open leizhu1989 opened 5 days ago

leizhu1989 commented 5 days ago

os:ubuntu20.04 cuda:118+torch2.3 transformers:4.41.0 models:glm4-9b-chat-1m int8 量化

我有一段很长的文字,发现输入1000字,12000个字就会报错,估计会很多这样的错误,这是为什么呢

Traceback (most recent call last): File "/home/lili/project/chatglm_cpp/examples/yhy_test.py", line 81, in main() File "/home/lili/project/chatglm_cpp/examples/yhy_test.py", line 62, in main for chunk in pipeline.chat(messages, **generation_kwargs): File "/root/anaconda3/envs/glm4-test/lib/python3.10/site-packages/chatglm_cpp/init.py", line 127, in _stream_chat for next_token_id in self._stream_generate_ids(input_ids=input_ids, gen_config=gen_config): File "/root/anaconda3/envs/glm4-test/lib/python3.10/site-packages/chatglm_cpp/init.py", line 115, in _stream_generate_ids next_token_id = self.model.generate_next_token(input_ids, gen_config, n_past, n_ctx) RuntimeError: /tmp/pip-install-088ypyu1/chatglm-cpp_5f184e002ea14195945da430b48f59b5/chatglm.cpp:767 check failed (std::isfinite(next_token_logits[i])) nan/inf encountered at lm_logits[0]

li-plus commented 4 days ago

在 #322 修复了,还没发布,可以先从最新代码装 python 包