run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.77k stars 5.27k forks source link

[Bug]: Pandas Query Engine Eval Problem with "*=" sign #15865

Open mohamedhazem24 opened 2 months ago

mohamedhazem24 commented 2 months ago

Bug Description

I requested a query to multiply some column with specific constant and it gave me the right code but unfortunately the llama index safe_eval function cannot execute this code df.loc[df['product'].isin([1069104, 1069105]), 'score'] *= 10 despite it is written in the right syntax . image

Version

latest version

Steps to Reproduce

run this code after install proper libs and using a groq API groq_API="" llm = Groq(model="llama3-70b-8192", api_key=groq_API) code_string = ''' df = pd.DataFrame({'product': [1179160, 1066490, 1148126, 1069104, 1069105, 1160330, 1069098, 1077784, 1193369, 1179741], 'score': [0.424654, 0.424509, 0.422207, 0.420455, 0.414603, 0.168784, 0.168749, 0.168738, 0.168703, 0.168684]}) print(df) '''

exec(code_string) query_engine = PandasQueryEngine(df=df, verbose=True,llm=llm) print(query_engine.query(''' what I'm trying to achieve is to multiply certain score values corresponding to specific products by a constant. I have the products target of this multiplication in a list like this: [1069104, 1069105] (this is just a simplified example, in reality it would be more than two products) and my goal is to obtain this: Multiply scores corresponding to products 1069104 and 1069105 by 10 '''))

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 2 months ago

@mohamedhazem24 Hello there! I'm here to help you with any bugs, questions, or contributions.

To resolve the SyntaxError: invalid syntax when using the llama_index package's safe_eval function to execute the Pandas code df.loc[df['product'].isin([1069104, 1069105]), 'score'] *= 10, you need to modify the code to avoid the in-place operation (*=), which might not be handled correctly by the safe_eval function.

Here is an example of how you can modify the code:

output = "df.loc[df['product'].isin([1069104, 1069105]), 'score'] = df.loc[df['product'].isin([1069104, 1069105]), 'score'] * 10"

This modification assigns the result of the multiplication back to the DataFrame without using the in-place operation.

Additionally, ensure that you are using Python version 3.9 or higher, as the default_output_processor function in the llama_index package requires it for proper execution. If you are using an older version of Python, the function will return the raw Python instructions as a string instead of executing them [1][2].

If you still encounter issues, you might need to debug the safe_eval and safe_exec functions to ensure they handle the code correctly. The default_output_processor function in the llama_index package's pandas/output_parser.py file provides a good reference for how the code is parsed and executed.

To continue talking to Dosu, mention @dosu.

mohamedhazem24 commented 2 months ago

@dosu but i cannot control the output to be like this : df.loc[df['product'].isin([1069104, 1069105]), 'score'] = df.loc[df['product'].isin([1069104, 1069105]), 'score'] * 10 is there another way and what changes do you I need to make in safe_eval method

dosubot[bot] commented 2 months ago

To modify the code without using the in-place operation in the safe_eval method, you need to avoid using copy.deepcopy and instead create a new dictionary that combines ALLOWED_BUILTINS and __globals. Here are the changes you need to make:

  1. Replace the _get_restricted_globals function to avoid using copy.deepcopy.
  2. Update the safe_eval method to use the new _get_restricted_globals function.

Here is the modified code:

def _get_restricted_globals(__globals: Union[dict, None]) -> Any:
    restricted_globals = {**ALLOWED_BUILTINS}
    if __globals:
        restricted_globals.update(__globals)
    return restricted_globals

def safe_eval(
    __source: Union[str, bytes, CodeType],
    __globals: Union[Dict[str, Any], None] = None,
    __locals: Union[Mapping[str, object], None] = None,
) -> Any:
    """
    eval within safe global context.
    """
    _verify_source_safety(__source)
    return eval(__source, _get_restricted_globals(__globals), __locals)

This modification avoids the in-place operation by creating a new dictionary using dictionary unpacking ({**ALLOWED_BUILTINS}) and then updating it with __globals if provided. This approach ensures that the original ALLOWED_BUILTINS dictionary remains unchanged [1][2][3].

To continue talking to Dosu, mention @dosu.

mohamedhazem24 commented 2 months ago

@dosu no I am getting this error

Pandas Output: There was an error running the output as Python code. Error message: invalid syntax (, line 1) There was an error running the output as Python code. Error message: invalid syntax (, line 1) Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 54, in default_output_processor output_str = str(safe_exec(module_end_str, global_vars, local_vars)) File "/usr/local/lib/python3.10/dist-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval File "", line 1 df.loc[df['product'].isin([1069104, 1069105]), 'score'] *= 10

dosubot[bot] commented 2 months ago

To resolve the invalid syntax error when using the safe_eval method to execute the Pandas code df.loc[df['product'].isin([1069104, 1069105]), 'score'] *= 10, you should ensure that the code is properly parsed and executed. The provided context shows how the default_output_processor function processes and executes the code using safe_eval and safe_exec.

Here is how you can modify the code to ensure it is correctly executed:

  1. Parse the code to separate the assignment operation.
  2. Use safe_exec to execute the assignment operation.

Here is an example of how you can do it:

import pandas as pd
from llama_index.experimental.exec_utils import safe_eval, safe_exec

# Sample DataFrame
df = pd.DataFrame({
    'product': [1069104, 1069105, 1069106],
    'score': [1, 2, 3]
})

# Code to be executed
code = "df.loc[df['product'].isin([1069104, 1069105]), 'score'] *= 10"

# Split the code into parts
code_parts = code.split('*=')
lhs = code_parts[0].strip()
rhs = code_parts[1].strip()

# Execute the code
safe_exec(f"{lhs} = {lhs} * {rhs}", {}, {'df': df})

# Verify the result
print(df)

This approach ensures that the assignment operation is correctly parsed and executed, avoiding the invalid syntax error. The default_output_processor function from the llama_index.experimental.query_engine.pandas.output_parser module demonstrates how to handle such code execution safely and effectively [1][2].

To continue talking to Dosu, mention @dosu.