run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.23k stars 4.95k forks source link

[Bug]: Pandas Query Engine does not give proper response( Especially when exposed via API, works well in local jupyter notebook) #14897

Open Nitheeswaran-E opened 1 month ago

Nitheeswaran-E commented 1 month ago

Bug Description

if pandas query engine run in a jupyter notebook it gives a proper result, but it does not give a proper result if runned as a single .py file

Version

llama-index==0.10.50

Steps to Reproduce

making it as a single .py file

Relevant Logs/Tracbacks

(venv) PS C:\Users\Nitheeswaran\Desktop\csv> & c:/Users/Nitheeswaran/Desktop/csv/venv/Scripts/python.exe c:/Users/Nitheeswaran/Desktop/csv/s.py
Traceback (most recent call last):
  File "C:\Users\Nitheeswaran\Desktop\csv\venv\lib\site-packages\pandas\core\indexes\base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\\_libs\\hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\\_libs\\hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Time generated'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Nitheeswaran\Desktop\csv\venv\lib\site-packages\llama_index\experimental\query_engine\pandas\output_parser.py", line 54, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))
  File "C:\Users\Nitheeswaran\Desktop\csv\venv\lib\site-packages\llama_index\experimental\exec_utils.py", line 159, in safe_eval
    return eval(__source, _get_restricted_globals(__globals), __locals)
  File "<string>", line 1, in <module>
  File "C:\Users\Nitheeswaran\Desktop\csv\venv\lib\site-packages\pandas\core\frame.py", line 4102, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\Nitheeswaran\Desktop\csv\venv\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc
    raise KeyError(key) from err
KeyError: 'Time generated'
The earliest log entry was generated at [insert timestamp here] and the latest log entry was generated at [insert timestamp here].
dosubot[bot] commented 1 month ago

Привет, @Nitheeswaran-E! Я здесь, чтобы помочь вам с вашей проблемой.

To resolve the issue where the Pandas Query Engine works correctly in a Jupyter notebook but throws a KeyError when executed as a standalone .py file, ensure that the column names used in your queries match exactly with the column names in your DataFrame. This includes checking for any leading or trailing spaces, case sensitivity, and special characters.

In your case, the KeyError: 'Time generated' suggests that the column 'Time generated' might not exist in the DataFrame when running the standalone script. Verify that the DataFrame df in your standalone script is correctly defined and contains the column 'Time generated'.

You can add debugging statements to print the DataFrame columns before querying to confirm the DataFrame structure:

print(df.columns)

This will help you ensure that the DataFrame has the expected columns when running the standalone script [1].

To continue talking to Dosu, mention @dosu.