run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.24k stars 4.95k forks source link

[Bug]: PandasQueryEngine cannot execute simple pandasframe code, giving error "NameError: name 'pd' is not defined" #14926

Closed tytung2020 closed 1 month ago

tytung2020 commented 1 month ago

Bug Description

NameError: name 'pd' is not defined

image

Version

llama-index-llms-openai - (0.1.27)

Steps to Reproduce

in colab:

!pip install llama-index-llms-openai

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

df = pd.read_csv("<some csv file with data>")

query_engine = PandasQueryEngine(df=df, verbose=True, llm=llm)

response = query_engine.query(
    "<some query about the data>",
)

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 1 month ago

To resolve the "NameError: name 'pd' is not defined" error when using the PandasQueryEngine in the llama-index-llms-openai package (version 0.1.27), ensure that the pandas library is imported and aliased as pd in your script. Here is an example of how you can do this:

import pandas as pd
from llama_index.experimental.query_engine import PandasQueryEngine
from llama_index.llms.openai import OpenAI

# Your code to read a CSV file into a Pandas DataFrame
df = pd.read_csv('<some csv file with data>')

llm = OpenAI(model="gpt-4o")

query_engine = PandasQueryEngine(df=df, verbose=True, llm=llm)

response = query_engine.query(
    '<some query about the data>',
)

print(response)

Make sure to include the import pandas as pd statement at the beginning of your script to avoid the NameError [1][2][3][4].

To continue talking to Dosu, mention @dosu.

tytung2020 commented 1 month ago

No. This code is generated by llamaindex pandas query engine, and should be executed by it automatically, and should not require our interference in the code.

justbane commented 1 month ago

I hjave the same issue and my tracing back to the cause has lead me to line 33 in llama-index-experimental/llama_index/experimental/query_engine/pandas/output_parser.py

Specifically the pd variable is set restricted for the safe_eval method.

I would also add this make any queries with dates completely unusable.

for instance for a query of "What month has the most streamed songs?"

The query returns these pandas commands.

df['Release Date'] = pd.to_datetime(df['Release Date'])
df['Release Month'] = df['Release Date'].dt.month
df['Release Month'].value_counts().idxmax()

Note the call to 'pd.' - This fails every time.