Closed cammoore54 closed 1 year ago
Update
I have tested on both my intel macbook pro and an ubuntu VM and experiencing the same behaviour. I believe it could be to do with openai's async completion or with how it is being parsed in chat_gpt_agent.py
+1 I'm experiencing this too.
My "workaround" is including specific instruction in the prompt "to always move the conversation forward" / "ask the next question" / etc. but I don't think this is a permanent solution, and it's makes the prompt larger than it needs to be.
Oddly (and if it helps), if I specifically respond with something like, OK? And now what?
It will continue as normal.
Where do you suspect this problem is?
When I test with the same prompt in openai's playground, it always follows up with a question. This makes me think it is an error openai's API or with the way vocode is unpacking the tokens from the openai stream.
I second this. This happens to us as well. It behaves differently for us "correctly" when using the chatgpt playground, always going to the next question.
The work around for me is to use generate_responses=False
(which does NOT use streaming) vs generate_responses=True
(which uses streaming) in the agent config
Curious to know if that helps? Not sure about a longer term / better solution here...
are you using 3.5 in the playground as well? it's likely just a prompting issue / model issue
you can test the behavior by putting the exact transcript/prompt used with vocode in the playground (including interruptions, etc.)
i don't think there's an "issue" per se with how we're using the chat api
@Kian1354 yep we have been using the exact prompt in vocode and in the chatgpt playground (using 3.5). Here's a loom of a side by side --https://www.loom.com/share/27dcfe59f20345658b00d5d4b94ca560 (the chat continues to ask the 4 questions, the voice agent stops after 2 unless i 'remind' it to continue as mentioned by @cammoore54 (you can see the log in the terminal on the right since you can't hear the audio through my headphones))
@Kian1354 After digging a little deeper I believe it is a bug with openai's python streaming library. The response from openai sends a stop event at the end of the sentence indicating that the completion is finished.
@tballenger I experience the same scenario. What's interesting is that in the playground they are streaming the response as well, but the response is different.
Output from openai_get_tokens
in vocode/streaming/agent/utils.py
{
"id": "chatcmpl-7iWTsDgi96dVoqjvfpxlb9HktUBJj",
"object": "chat.completion.chunk",
"created": 1690845644,
"model": "gpt-3.5-turbo-16k-0613",
"choices": [
{
"index": 0,
"delta": {
"content": " know"
},
"finish_reason": null
}
]
}
know
{
"id": "chatcmpl-7iWTsDgi96dVoqjvfpxlb9HktUBJj",
"object": "chat.completion.chunk",
"created": 1690845644,
"model": "gpt-3.5-turbo-16k-0613",
"choices": [
{
"index": 0,
"delta": {
"content": "."
},
"finish_reason": null
}
]
}
.
{
"id": "chatcmpl-7iWTsDgi96dVoqjvfpxlb9HktUBJj",
"object": "chat.completion.chunk",
"created": 1690845644,
"model": "gpt-3.5-turbo-16k-0613",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
]
}
AI: Thank you for letting me know.
I also tried changing the completion call to openai.ChatCompletion.create(**chat_parameters)
to remove the likelihood it is coming from the async call and the issue persists.
I've raised an issue in the openai-python git here
thanks for the update @cammoore54 and nice find! let's hope they address the bug quickly
No worries @tballenger! Can you you comment on that ticket to help it gain momentum
yep absolutely
I've commented on it too!
@tballenger Do you find your workaround to be sufficient in the interim or any known negative implications from unsetting that flag? My initial tests seem to be OK, but I want to make sure I'm not missing anything obvious
Thanks
@rjheeta i believe setting that flag bypasses the streaming functionality, so it seems a little slower from our tests, plus we don't think it supports functions, which are what the actions use. So only considering as a short-term workaround, we hope the streaming python library gets fixed quickly!
+1 I'm getting this too. I'm confident it's not a GPT3.5 vs 4 difference, as I'm using 4 with vocode. Before I discovered vocode, I had an implementation of an ASR+Agent+TTS application I'd written myself, and it's very verbose in its responses. But I put the same prompts into vocode and it gets terser and terser as the conversation goes on, finally just returning one sentence answers. I worried that it was the max_tokens parameter in the openai API call, but I commented that out. That may have made it better, but since it's inconsistent behavior, I'm not 100% sure. But I still get lots of one sentence responses later on in the conversation when the exact same original prompt on the exact same model, using streaming completion from openai with my own implementation never devolves into one sentence responses.
@bjquinn can you share how you process openai streaming with your own code?
After doing some more testing, I have been able to get consistent results with streaming using this code:
while True:
message = input("User : ")
if message:
messages.append(
{"role": "user", "content": message},
)
response = openai.ChatCompletion.create(
messages=messages,
max_tokens=256,
temperature=1.0,
model="gpt-3.5-turbo-16k-0613",
stream=True
)
collected_chunks = []
collected_messages = []
# iterate through the stream of events
for chunk in response:
print(chunk)
collected_chunks.append(chunk) # save the event response
chunk_message = chunk['choices'][0]['delta'] # extract the message
collected_messages.append(chunk_message) # save the message
# print the time delay and text received
full_reply_content = ''.join([m.get('content', '') for m in collected_messages])
print(f"Full conversation received: {full_reply_content}")
messages.append(
{"role": "assistant", "content": full_reply_content},
)
@Kian1354 this is making me think that it may be to do with vocode's async implementation of unpacking the completion call
@cammoore54 Wow, great investigations!
Do you know where the corresponding response handling in vocode is? You're certainly more versed than I am, but I can try to take a look to see if I spot anything odd.
and calls class ChatGPTAgent's generate_reponse method https://github.com/vocodedev/vocode-python/blob/717514c4905d20a1252b94bf693f0badbff0cbd7/vocode/streaming/agent/chat_gpt_agent.py#L132
@bjquinn can you share how you process openai streaming with your own code?
Yes, see below. Let me know if this is what you were asking for:
async for chunk in await openai.ChatCompletion.acreate(
model=model,
messages=messages,
stream=True,
functions=functions
):
content = chunk["choices"][0].get("delta", {}).get("content")
# hacky logic to string together sentences and track ends of sentences here. for each sentence, add it to "fullcontent"
messages.append({"role": "assistant", "content": fullcontent})
That's really it -- I do have some hacky sentence detection logic in the async for, and I kick off async requests to play.ht once I detect a sentence end, but I don't think any of that would affect whether I get full completions from openai or not.
Thanks @bjquinn.
I have tested in isolation with async implementation using the below code and I get the desired responses 100% of the time (the same as playground). Therefore it has to do with the implementation in vocode.
@ajar98 @Kian1354 Do you have the capacity to look into this? I am happy to support but am still familiarising myself with the codebase
async def generate_response(messages):
async for chunk in await openai.ChatCompletion.acreate(
model="gpt-3.5-turbo-16k-0613",
messages=messages,
stream=True,
functions=functions
):
chunk_message = chunk['choices'][0]['delta']
yield chunk_message
async def handle_convo():
while True:
message = input("User : ")
if message:
messages.append(
{"role": "user", "content": message},
)
collected_messages = []
async for item in generate_response(messages):
print(item)
collected_messages.append(item)
full_reply_content = ''.join([m.get('content', '') for m in collected_messages])
print(f"Full conversation received: {full_reply_content}")
messages.append(
{"role": "assistant", "content": full_reply_content},
)
asyncio.run(handle_convo())
I have identified the problem. Vocode splits the OpenAI response on sentences in order to synthesize them as fast as possible. After something is spoken, Vocode adds the utterance to the transcript associated with the ChatGPT Agent. As a result, OpenAI's response gets added to the transcript but split apart by sentence. So, when the user sends another message and this transcript is reformatted and sent back to OpenAI to generate the next message, the previous assistant message is split.
For example, when recreating @cammoore54's example with temperature=0
and the gpt-3.5-turbo-16k-0613
model, this is what is sent to the OpenAI API when the user says "yep":
{'role': 'assistant', 'content': "Hello, I'm Tom from the golf course. How may I help you?"},
{'role': 'user', 'content': 'hey i want to book comp'},
{'role': 'assistant', 'content': 'Sure, I can help you with that.'},
{'role': 'assistant', 'content': 'Are you a member of our golf club?'},
{'role': 'user', 'content': 'yep'},
This is what should be sent (what you put into the OpenAI playground):
{'role': 'assistant', 'content': "Hello, I'm Tom from the golf course. How may I help you?"},
{'role': 'user', 'content': 'hey i want to book comp'},
{'role': 'assistant', 'content': 'Sure, I can help you with that. Are you a member of our golf club?'},
{'role': 'user', 'content': 'yep'},
This difference is the source of the problem. If the previous chat history contains only one sentence responses, then future assistant messages will also only be one sentence.
Good catch finding this bug! The messages should definitely not be split apart when they are sent back to the OpenAI API.
Here are the differences from the OpenAI playground (both with temperature=0
and the gpt-3.5-turbo-16k-0613
model):
Above: Formatted properly, the second sentence is generated.
Above: Formatted how Vocode currently does it, only one sentence is generated.
So, it seems like the OpenAI playground and the OpenAI python library create the exact same response (testing with temperature=0
). Also, setting the stream
option or using the async vs sync api doesn't make a difference. The OpenAI Python issue https://github.com/openai/openai-python/issues/555 is probably not an issue after all. Depending on how the messages are formatted, the second sentence is not generated.
We are currently working on a fix for this! Thanks :smile:!
Ah nice find! Thanks @HHousen. Seems so obvious now 😵💫.
Well done vocode team, love your product and your support!
@HHousen I tried the patch and it looks to work on my end!!
@HHousen I tried the patch and it looks to work on my end!!
Nice! We're still working on it and might change how its implemented in the next few hours, but good to know that its currently working!
I have identified the problem. Vocode splits the OpenAI response on sentences in order to synthesize them as fast as possible.
@HHousen Thank you for the explanation here. I believe this also explains an intermittent problem I have been seeing on the synthesis side - namely that sometimes the assistant message passed to synthesizer is split on the .
character that appears in decimal values (like currencies).
This ends up sounding very confusing to the user. The synthesized audio will be something like:
Your balance is two-hundred-fifty. (pause) Thirty-five...
Instead of:
Your balance is two-hundred-fifty and thirty-five.
This is a separate issue that I need to revalidate with the 0.1.111a3 pre-release. (I haven't seen it so far.)
Thanks again!
I have identified the problem. Vocode splits the OpenAI response on sentences in order to synthesize them as fast as possible.
@HHousen Thank you for the explanation here. I believe this also explains an intermittent problem I have been seeing on the synthesis side - namely that sometimes the assistant message passed to synthesizer is split on the
.
character that appears in decimal values (like currencies).This ends up sounding very confusing to the user. The synthesized audio will be something like:
Your balance is two-hundred-fifty. (pause) Thirty-five...
Instead of:
Your balance is two-hundred-fifty and thirty-five.
This is a separate issue that I need to revalidate with the 0.1.111a3 pre-release. (I haven't seen it so far.)
Thanks again!
Yes, that's true, though I don't think this fix will solve that. See https://github.com/vocodedev/vocode-python/issues/338 for an issue I submitted that has other quirks about premature sentence ending detection. For now, if this is helpful to you, I simply asked GPT in the system prompt to spell out all dollar amounts, and that seems to work well.
After making minimal changes to the chat.py example to tailor it for a golf booking chatbot flow, the openai completions stop consistently.
The below has been repeatable many times. When the conversation reaches this point, the system responds
Thank you for letting me know.
but doesn't send the following sentence asking the user another question.The response stops, then after I send an empty message to the system I receive the above asyncio error and the system continues as normal.
EDIT: I should state that the completion stops at
Thank you for letting me know.
100% of the time but the asyncio error only happens occasionally.EPD-458