Open thinkall opened 9 months ago
Good observation here. My initial intuition is the model thinks the code being generated is part of a human-in-the-loop chat (which is expected for a chat finetuned model), where the human will aggregate and execute the blocks. One solution might be to improve prompting (e.g., return full executable blocks of code .. maybe with some few shot examples?)
I recall that @afourney also noted a few cases where the order of code blocks had some impact on the results (e.g. installing deps before code)
Yes @victordibia that is issue #430. I think prompting can help, but the existing prompt is pretty strong. It states:
"The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user. If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes."
I think one problem is that the system prompt is right at the top of the conversation, and can be forgotten in longer conversations. Perhaps a floating system prompt would be better (moving it dynamically to right before the generation)
Thanks @victordibia , @afourney , it seems that many weaker models will not follow the prompt as expected. It would be great if we can merge the code blocks in a single response in the post processing step. But this is a non-trival process.
What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."
This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.
What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."
This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.
Sounds good! Maybe we can create a function to consolidate the code, and in the function, it actually calls the LLM to do it. In this way, we could save some token usage.
Is this solved by using the stateful jupyter code executor?
Is this solved by using the stateful jupyter code executor?
I get AttributeError: 'WebSocket' object has no attribute 'send_text'
, the code is https://github.com/microsoft/autogen/blob/c4e570393db9d2c2d3058e271ca2e46473bd8074/autogen/coding/jupyter/jupyter_client.py#L139
Replace send_text
with send
worked for me.
@jackgerrits please see the above comment.
Is this solved by using the stateful jupyter code executor?
I get
AttributeError: 'WebSocket' object has no attribute 'send_text'
, the code isReplace
send_text
withsend
worked for me.
What version of websocket-client
is installed?
Is this solved by using the stateful jupyter code executor?
I get
AttributeError: 'WebSocket' object has no attribute 'send_text'
, the code is https://github.com/microsoft/autogen/blob/c4e570393db9d2c2d3058e271ca2e46473bd8074/autogen/coding/jupyter/jupyter_client.py#L139Replace
send_text
withsend
worked for me.What version of
websocket-client
is installed?
websocket-client 1.6.4 websockets 12.0
Could you retry using websocket-client==1.7.0
?
Could you retry using
websocket-client==1.7.0
?
1.7.0 works well. I checked the code of websocket-client, send_text
was added in 1.7.0. Since send
works in our case as well, I'd suggest use send
instead of send_text
. Or at least update the extras in setup.py
.
The current code execute logic has an issue with code generated in several blocks and later blocks depends on the former ones.
For example:
The generated code is good but the execution is failed because the code blocks are exectuted separately. With the feedback
NameError: name 'sum_numbers' is not defined
, GPT-3.5-turbo usually can merge the blocks into one block, but other models which are not as good as GPT-3.5 will not merge them and keep failing. Anyway, it would be better to be able to correctly execute the blocks.