Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors.However, there is no problem with the replies generated by the chat and agent, the problem is in the search module. Is there a difference in their logic for interacting with the model?
Expected behavior
No response
Steps to reproduce
1. Deploy the 9b glm4-chat model of the transformer engine in pytorch format on Xinference
2. Connect ragflow
3. The same problem does not appear in the chat interface, but appears in the search interface.
Additional information
Exception in thread Thread-4 (generate):
Traceback (most recent call last):
File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, *self._kwargs)
File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(args, **kwargs)
File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0
Is there a difference in their logic for interacting with the model?
The system default LLM is used at 'search' page.
For chat, the LLM is defined in dialog setting.
Nothing special for 'Search' to call LLM.
Is there an existing issue for the same bug?
Branch name
main
Commit ID
1085
Other environment information
No response
Actual behavior
Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors.However, there is no problem with the replies generated by the chat and agent, the problem is in the search module. Is there a difference in their logic for interacting with the model?
Expected behavior
No response
Steps to reproduce
Additional information
Exception in thread Thread-4 (generate): Traceback (most recent call last): File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either
inf
,nan
or element < 0