Open nbahar-intel opened 4 days ago
To use response_gen
without getting the thoughts when using as_chat_engine.chat_stream()
, you need to ensure that the _is_function
flag is set to False
. This flag is used to determine whether the response contains tool calls or other additional information.
Here is a snippet of how you can achieve this:
async def awrite_response_to_history(
self,
memory: BaseMemory,
) -> None:
if self.achat_stream is None:
raise ValueError(
"achat_stream is None. Cannot asynchronously write to "
"history without achat_stream."
)
try:
final_text = ""
async for chat in self.achat_stream:
self._is_function = is_function(chat.message)
self.aput_in_queue(chat.delta)
final_text += chat.delta or ""
if self._is_function is False:
self._is_function_false_event.set()
if self._is_function is not None:
chat.message.content = final_text.strip()
memory.put(chat.message)
except Exception as e:
logger.warning(f"Encountered exception writing response to history: {e}")
self._is_done = True
self._is_function_false_event.set()
self._new_item_event.set()
@property
def response_gen(self) -> Generator[str, None, None]:
while not self._is_done or not self._queue.empty():
try:
delta = self._queue.get(block=False)
self._unformatted_response += delta
yield delta
except queue.Empty:
continue
self.response = self._unformatted_response.strip()
async def async_response_gen(self) -> AsyncGenerator[str, None]:
while not self._is_done or not self._aqueue.empty():
if not self._aqueue.empty():
delta = self._aqueue.get_nowait()
self._unformatted_response += delta
yield delta
else:
await self._new_item_event.wait()
self._new_item_event.clear()
self.response = self._unformatted_response.strip()
In this code, the _is_function
flag is checked, and if it is False
, the _is_function_false_event
is set, which ensures that only the response text is processed without additional thoughts or tool calls [1][2].
To continue talking to Dosu, mention @dosu.
Technically it shouldn't be, but the code is a little complex to debug https://github.com/run-llama/llama_index/blob/faa96cefb93b07c22d1a512e575b4a884a565cd4/llama-index-core/llama_index/core/agent/react/step.py#L625
maybe its worth just building an agent more from-scratch so that you have more control over the output
General workflows walkthrough: https://docs.llamaindex.ai/en/stable/module_guides/workflow/#workflows
React Agent example: https://docs.llamaindex.ai/en/stable/examples/workflow/react_agent/
Function calling agent example with streaming: https://colab.research.google.com/drive/1wVCkvX7oQu1ZwrMSAyaJ8QyzHyfR0D_j?usp=sharing
Question Validation
Question
How can I use
response_gen
without getting the thoughts?In my example:
hey querying:
To clarify, I just want stream_gen to stream the answer