Is there a better way to execute the generated code?

thinkall commented 9 months ago

The current code execute logic has an issue with code generated in several blocks and later blocks depends on the former ones.

For example:

User_Proxy (to Assistant_Agent):

Write a python function to compute the sum of numbers.

Assistant_Agent (to User_Proxy):

Sure! Here's a Python function that computes the sum of numbers:
def sum_numbers(numbers):
    return sum(numbers)
You can use this function by passing a list of numbers to it, like this:
numbers = [1, 2, 3, 4, 5]
sum_result = sum_numbers(numbers)
print(sum_result)  # Output: 15
This function uses the sum() built-in function to calculate the sum of the numbers in the list. It returns the result as a single number.

If you want to compute the sum of an iterable such as a tuple or an iterator, you can also modify the function to accept those inputs instead of a list.

For example, if you have a tuple of numbers like (1, 2, 3, 4) you can pass it to the function like this:
numbers_tuple = (1, 2, 3, 4)
sum_result = sum_numbers(numbers_tuple)
print(sum_result)  # Output: 10
This will work for any iterable that contains numbers and you can modify the function accordingly to handle different types of inputs.

USING AUTO REPLY...xxxx

EXECUTING CODE BLOCK 0 (inferred language is python)...

EXECUTING CODE BLOCK 1 (inferred language is python)... User_Proxy (to Assistant_Agent):

exitcode: 1 (execution failed) Code output:

Traceback (most recent call last): File "", line 2, in sum_result = sum_numbers(numbers) NameError: name 'sum_numbers' is not defined

The generated code is good but the execution is failed because the code blocks are exectuted separately. With the feedback NameError: name 'sum_numbers' is not defined, GPT-3.5-turbo usually can merge the blocks into one block, but other models which are not as good as GPT-3.5 will not merge them and keep failing. Anyway, it would be better to be able to correctly execute the blocks.

victordibia commented 9 months ago

Good observation here. My initial intuition is the model thinks the code being generated is part of a human-in-the-loop chat (which is expected for a chat finetuned model), where the human will aggregate and execute the blocks. One solution might be to improve prompting (e.g., return full executable blocks of code .. maybe with some few shot examples?)

I recall that @afourney also noted a few cases where the order of code blocks had some impact on the results (e.g. installing deps before code)

afourney commented 9 months ago

Yes @victordibia that is issue #430. I think prompting can help, but the existing prompt is pretty strong. It states:

"The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user. If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes."

I think one problem is that the system prompt is right at the top of the conversation, and can be forgotten in longer conversations. Perhaps a floating system prompt would be better (moving it dynamically to right before the generation)

thinkall commented 9 months ago

Thanks @victordibia , @afourney , it seems that many weaker models will not follow the prompt as expected. It would be great if we can merge the code blocks in a single response in the post processing step. But this is a non-trival process.

afourney commented 9 months ago

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

thinkall commented 9 months ago

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

Sounds good! Maybe we can create a function to consolidate the code, and in the function, it actually calls the LLM to do it. In this way, we could save some token usage.

jackgerrits commented 5 months ago

Is this solved by using the stateful jupyter code executor?

thinkall commented 4 months ago

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is https://github.com/microsoft/autogen/blob/c4e570393db9d2c2d3058e271ca2e46473bd8074/autogen/coding/jupyter/jupyter_client.py#L139

Replace send_text with send worked for me.

ekzhu commented 4 months ago

@jackgerrits please see the above comment.

jackgerrits commented 4 months ago

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

https://github.com/microsoft/autogen/blob/c4e570393db9d2c2d3058e271ca2e46473bd8074/autogen/coding/jupyter/jupyter_client.py#L139

Replace send_text with send worked for me.

What version of websocket-client is installed?

thinkall commented 4 months ago

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is https://github.com/microsoft/autogen/blob/c4e570393db9d2c2d3058e271ca2e46473bd8074/autogen/coding/jupyter/jupyter_client.py#L139

Replace send_text with send worked for me.

What version of websocket-client is installed?

websocket-client 1.6.4 websockets 12.0

jackgerrits commented 4 months ago

Could you retry using websocket-client==1.7.0?

thinkall commented 4 months ago

Could you retry using websocket-client==1.7.0?

1.7.0 works well. I checked the code of websocket-client, send_text was added in 1.7.0. Since send works in our case as well, I'd suggest use send instead of send_text. Or at least update the extras in setup.py.

microsoft / autogen

Is there a better way to execute the generated code? #768