infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
https://ragflow.io
Apache License 2.0
18.18k stars 1.84k forks source link

[Bug]: Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors #2543

Open smileyee opened 4 days ago

smileyee commented 4 days ago

Is there an existing issue for the same bug?

Branch name

main

Commit ID

1085

Other environment information

No response

Actual behavior

Using Xinference's locally deployed glm4-chat, the search module of RagFlow will produce errors.However, there is no problem with the replies generated by the chat and agent, the problem is in the search module. Is there a difference in their logic for interacting with the model?

Expected behavior

No response

Steps to reproduce

1. Deploy the 9b glm4-chat model of the transformer engine in pytorch format on Xinference
2. Connect ragflow
3. The same problem does not appear in the chat interface, but appears in the search interface.

Additional information

Exception in thread Thread-4 (generate): Traceback (most recent call last): File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() File "/root/anaconda3/envs/py310/lib/python3.10/threading.py", line 953, in run self._target(*self._args, *self._kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(args, **kwargs) File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate result = self._sample( File "/root/anaconda3/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 3020, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0

KevinHuSh commented 3 days ago

Is there a difference in their logic for interacting with the model? The system default LLM is used at 'search' page. For chat, the LLM is defined in dialog setting. Nothing special for 'Search' to call LLM.