run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.96k stars 5.3k forks source link

[Bug]: NameError: name 'p' is not defined #15581

Open tytung2020 opened 3 months ago

tytung2020 commented 3 months ago

Bug Description

I use PandasQueryEngine to query large database, but it gives this error output:

Pandas Instructions:

df['gl'].apply(lambda x: x['Account']).value_counts()
Traceback (most recent call last):
File "c:\Users\tytun\anaconda3\envs\pulse_ai\Lib\site-packages\llama_index\experimental\query_engine\pandas\output_parser.py", line 54, in default_output_processor
output_str = str(safe_eval(module_end_str, global_vars, local_vars))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\tytun\anaconda3\envs\pulse_ai\Lib\site-packages\llama_index\experimental\exec_utils.py", line 159, in safe_eval
return eval(__source, _get_restricted_globals(__globals), __locals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
NameError: name 'p' is not defined

but strangely, 'p' is not shown in the Pandas Instructions above.

Version

llama-index 0.11.0

Steps to Reproduce


from llama_index.experimental.query_engine import PandasQueryEngine
from llama_index.llms.vertex import Vertex
import pandas as pd

def process_large_dataset_with_llamaindex(data: Dict[str, Any], query: str) -> str:
    """Process large dataset with LlamaIndex."""
    df = pd.DataFrame(data)
    vertex_gemini = Vertex(model="gemini-1.5-flash", temperature=0, additional_kwargs={})
    query_engine = PandasQueryEngine(df=df, verbose=True, llm=vertex_gemini)
    response = query_engine.query(query)
    return str(response)

result = <some large database here>
user_query = "summarize the data"
llamaindex_response = process_large_dataset_with_llamaindex(result, user_query)

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 3 months ago

I found a similar issue that might be relevant to your problem. The issue titled "[Bug]: Pandas Output: There was an error running the output as Python code. Error message: name 'p' is not defined" is still open and was last updated on July 24, 2024. The suggested solution is to ensure that the plotting library (likely matplotlib.pyplot as plt) is imported and used correctly [1].

Additionally, the NameError: name 'p' is not defined occurs because the PandasQueryEngine in llama-index 0.11.0 is deprecated and raises a DeprecationWarning upon initialization. The provided code does not define or use a variable named 'p', and the error likely stems from attempting to use the deprecated PandasQueryEngine.

To resolve this issue, you should use the PandasQueryEngine from the llama-index-experimental package as indicated in the deprecation warning. Install the package and import the PandasQueryEngine from there:

pip install llama-index-experimental

Here is an updated version of your code:

import pandas as pd
from llama_index.experimental.query_engine.pandas.pandas_query_engine import PandasQueryEngine
from llama_index.llms.vertex import Vertex

def process_large_dataset_with_llamaindex(data: Dict[str, Any], query: str) -> str:
    """Process large dataset with LlamaIndex."""
    df = pd.DataFrame(data)
    vertex_gemini = Vertex(model="gemini-1.5-flash", temperature=0, additional_kwargs={})
    query_engine = PandasQueryEngine(df=df, verbose=True, llm=vertex_gemini)
    response = query_engine.query(query)
    return str(response)

result = <some large database here>
user_query = "summarize the data"
llamaindex_response = process_large_dataset_with_llamaindex(result, user_query)

This should resolve the NameError issue by using the updated PandasQueryEngine implementation [2][3].

To continue talking to Dosu, mention @dosu.

tytung2020 commented 3 months ago

I have tried your code and the same error still occurs. @dosu

dosubot[bot] commented 3 months ago

The NameError: name 'p' is not defined occurs because the variable p is not defined in the scope where it is being used. In the context of the PandasQueryEngine from the llama-index-experimental package, this error can happen if the generated Pandas code (which is executed using eval()) references a variable p that has not been defined.

To debug this issue, you can print the generated Pandas instructions before they are executed to see if there are any references to undefined variables. Here is a snippet from the _query method where you can add a print statement to debug the generated Pandas instructions:

def _query(self, query_bundle: QueryBundle) -> Response:
    """Answer a query."""
    context = self._get_table_context()

    pandas_response_str = self._llm.predict(
        self._pandas_prompt,
        df_str=context,
        query_str=query_bundle.query_str,
        instruction_str=self._instruction_str,
    )

    if self._verbose:
        print_text(f"> Pandas Instructions:\n" f"```\n{pandas_response_str}\n```\n")

    # Add this print statement to debug the generated Pandas instructions
    print(f"Generated Pandas Instructions: {pandas_response_str}")

    pandas_output = self._instruction_parser.parse(pandas_response_str)
    if self._verbose:
        print_text(f"> Pandas Output: {pandas_output}\n")

    response_metadata = {
        "pandas_instruction_str": pandas_response_str,
        "raw_pandas_output": pandas_output,
    }
    if self._synthesize_response:
        response_str = str(
            self._llm.predict(
                self._response_synthesis_prompt,
                query_str=query_bundle.query_str,
                pandas_instructions=pandas_response_str,
                pandas_output=pandas_output,
            )
        )
    else:
        response_str = str(pandas_output)

    return Response(response=response_str, metadata=response_metadata)

By adding the print statement, you can inspect the generated Pandas instructions and ensure that all variables used in the code are properly defined [1].

To continue talking to Dosu, mention @dosu.

Shubh789da commented 1 month ago

Even I am also facing the same issue, sometimes when I say provide answer only in text then some times it resolves not mostly not. Its fustration when working with mutiple query engines please help.

yangxg commented 5 days ago

I am encountering the same issue. The PandasQueryEngine gives the correct code instruction, but when running the code, there are the error. Please help, thanks!

> Pandas Output: There was an error running the output as Python code. Error message: name 'p' is not defined
Traceback (most recent call last):
  File "/home/yxg/.pyenv/versions/3.12.7/envs/llamaindex/lib/python3.12/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 61, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yxg/.pyenv/versions/3.12.7/envs/llamaindex/lib/python3.12/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
    return eval(__source, _get_restricted_globals(__globals), __locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
NameError: name 'p' is not defined