[Bug]: Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors

Is there an existing issue for the same bug?

[X] I have checked the existing issues.

Branch name

main

Commit ID

1085

Other environment information

No response

Actual behavior

Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors.However, there is no problem with the replies generated by the chat and agent, the problem is in the search module. Is there a difference in their logic for interacting with the model?

Expected behavior

No response

Steps to reproduce

1. Deploy the 9b glm4-chat model of the transformer engine in pytorch format on Xinference
2. Connect ragflow
3. The same problem does not appear in the chat interface, but appears in the search interface.

Additional information

Exception in thread Thread-4 (generate): Traceback (most recent call last): File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

infiniflow / ragflow