Incorrect Behavior in GroupChat Example

husseinmozannar commented 10 months ago

Bug in notebook example:

There is an issue in a notebook example (and one thing that can improve it) https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat_vis.ipynb

TL;DR: Coder ends up downloading dataset from a different URL (without verifying that it is the same!) instead of debugging and thus the task is deemed complete. This occurs because the User_proxy execution does not provide the partial execution trace (output of the print(df.columns) statement) and thus the coder does not know how to debug.

Recommendation: Add partial execution results when user_proxy talks to chat_manager

Potential fix: on main branch as of 10/25/2023 10am ET, update autogen/code_utils.py line 326-335

        if result.returncode:
            logs = result.stderr
            if original_filename is None:
                abs_path = str(pathlib.Path(filepath).absolute())
                logs = logs.replace(str(abs_path), "").replace(filename, "")
            else:
                abs_path = str(pathlib.Path(work_dir).absolute()) + PATH_SEPARATOR
                logs = logs.replace(str(abs_path), "")
        else:
            logs = result.stdout

to:

        if result.returncode:
            logs = result.stdout + "\n" + result.stderr  # LINE CHANGED
            if original_filename is None:
                abs_path = str(pathlib.Path(filepath).absolute())
                logs = logs.replace(str(abs_path), "").replace(filename, "")
            else:
                abs_path = str(pathlib.Path(work_dir).absolute()) + PATH_SEPARATOR
                logs = logs.replace(str(abs_path), "")
        else:
            logs = result.stdout

I haven't tested this properly, but think it might work.

At some point, the coder has the code

import requests
import io
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# URL to download the data
url = "https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv"

# Download the data using requests
response = requests.get(url)
response.raise_for_status()
content = response.content.decode('utf-8')
csv_file = io.StringIO(content)
df = pd.read_csv(csv_file)

# Print the fields in the dataset and the first few rows
print(df.columns)
print(df.head())

# Prepare the plot
plt.figure(figsize=(8, 6))
sns.scatterplot(data=df, x='Weight_in_lbs', y='Horsepower')

# Save the plot to a file
plt.savefig('weight_vs_horsepower.png')

# Show the plot
plt.show()

this code won't run because "Weight_in_lbs" is not a column in the df, BUT if you had provided the partial execution trace to the agent which would include the names of the columns:

Index(['Name', 'Type', 'AWD', 'RWD', 'Retail Price', 'Dealer Cost',
       'Engine Size (l)', 'Cyl', 'Horsepower(HP)', 'City Miles Per Gallon',
       'Highway Miles Per Gallon', 'Weight', 'Wheel Base', 'Len', 'Width'],
      dtype='object')

it would easily figure it out! BUT the user_proxy does not give it this information! only that an exception happened that "Weight_in_lbs" is not valid.

The conversation becomes repetitive for 2-3 rounds with the same error, effectively stuck in a loop.

Until the critic says:

Critic (to chat_manager):

I apologize for the ongoing confusion. After re-evaluating the problem, it has come to my attention that the dataset URL provided points to a newer version of the "cars" dataset, causing the column name discrepancies. The appropriate URL to use is https://raw.githubusercontent.com/vega/vega-datasets/gh-pages/data/cars.json.

Which is an invalid URL that the coding agent has to change, thus marking clear break in communication!

The invalid URL is then changed to

url = "https://raw.githubusercontent.com/vega/vega-datasets/main/data/cars.json"

and the code runs and outputs a nice plot. But the user has no idea that the agents completely changed the correct URL.

### Tasks
- [ ] edit code_utils.py

afourney commented 10 months ago

Fantastic bug report @husseinmozannar (and hello!)

cc @victordibia and @gagb who I believe worked on that notebook.

sonichi commented 9 months ago

Let's do some test with it. I tried this before and it degraded the performance in some cases (e.g., when the output is long). It could be potentially improved with the idea of #439 though.

microsoft / autogen

Incorrect Behavior in GroupChat Example #418