Closed amygbAI closed 4 months ago
It seems you're encountering issues with the PandasQueryEngine
when using it with open-source LLMs like Mistral-7B and Llama-2-7b, particularly with handling complex queries involving tabular data. The challenges you've outlined, such as incorrect query handling and partial answers, might stem from the inherent limitations of the PandasQueryEngine
, including its ability to process complex conditions, its security measures, and its flexibility in handling different types of queries.
To address the specific issues you've mentioned:
These problems suggest a need for a more robust handling mechanism within the PandasQueryEngine
for complex queries, especially those requiring multiple conditions or aggregations.
Unfortunately, there doesn't seem to be benchmarking information available for the LlamaIndex framework with different LLMs, including open-source models. This makes it challenging to directly address compatibility or performance issues with specific LLMs like Llama-2-7b.
To better assist you, it would be helpful to have the exact error messages or incorrect outputs you're encountering. Additionally, more details about your specific use case, including the type of data, queries, and any particular performance or security considerations, would enable us to suggest more targeted solutions or workarounds.
For further reference, you can review the test cases for the PandasQueryEngine
and the source code.
Please let me know if you need further clarification or have additional questions.
the above was a terrible summary by the bot .. i know for a fact that the problem is NOT with the query engine ..its the LLM's which keep throwing random stuff the engine's way ( coz the engine is simply creating custom templates for different tasks and handing it over to the LLM )
@amygbAI open-source LLMs are, in general, terrible with structured outputs. Add complex reasoning on top of that, and its not going to be a fun time.
You mentioned Llama2, which is my experience is one of the worst offenders.
Newer models are maybe getting better (command-r, codellama, etc.), but not really reliable for this task without finetuning
Anthropic, gemini, openai, mistral-large, etc. all work pretty well though. Of course, those are hosted models
Thank you for your response π..agreed ..just that getting any URL whitelisted ( for access to hosted models ) via any reasonably sized company's infosec team is a bureaucratic nightmare ..I'm hoping llama3 release might help ..if I do manage to fine-tune this for llama2 though, I shall add details to this thread
On Wed, Apr 10, 2024, 10:52 PM Logan @.***> wrote:
Anthropic, gemini, openai, mistral-large, etc. all work pretty well though. Of course, those are hosted models
β Reply to this email directly, view it on GitHub https://github.com/run-llama/llama_index/issues/12692#issuecomment-2048094800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAEMO7XBMSYOWIO4JLY4VYMZAVCNFSM6AAAAABF72Q3FCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYGA4TIOBQGA . You are receiving this because you were mentioned.Message ID: @.***>
Hello @amygbAI ! I am currently dealing with the same problem. I also found this langchain function but it has the same problem that it only works with GPT LLMs. I am thinking of using of the Llama 3 LLM locally. Were you able to find a solution to your problems?
Hi @uzairlatif90 i did exactly what you are/were thinking of doing. Just deployed Llama3 ( instruct-8B ) ..now im working on quantizing it via llama cpp ..but llama3 is still a ways behind gpt4 so i am building a scaffolding framework around double checking its output. Basically have a set of pre defined query responses and see how the output compares. In my case it was ~80-85% accurate.
Thank you so much for your reply @amygbAI ! Would you be able to share a snippet from your code on how exactly you deployed the Llama3 (instruct-8B) to analyse the dataframe with llamacpp. I have been struggling with that and I would be really grateful for your help.
@uzairlatif90 the llama cpp thing is a WIP :) ..like i mentioned in my comment, i am still working on it .. once im able to get it up and running i will most certainly share it here
@uzairlatif90 it was really simple in the end .. install ollama https://github.com/ollama/ollama/tree/main and ollama pull llama3:instruct
and start their server using ollama serve
..ideally it starts by default ..then you can create your own API using a simple python requests like so
url = "http://localhost:11434/api/generate"
data = {
"model": "llama3:instruct",
"prompt": '''In the following code below extract the global variables and method definitions. Restrict answer to 2 lists. One with list of variables and the other with list of method definitions.\n govt_id_sync_dict = dict({"drivinglicence" : "DL",
"voterid" : "Voter ID",
"passport" : "Passport",
"aadhaar" : "Aadhaar",
"pan" : "PAN"})
def get_num_pages(pdf_path):
# Get num of pages '''
}
the only thing is the response comes out in the form of individual tokens so you will need to process it but otherwise, quality wise there's a small degradation ( since this is a 4.7 GB quantized version of the 8B instruct model ) BUT still much much better than llama2 and almost as good as gpt3.5 once u have this wrapper, instead of calling the pandasqueryengine you can call this straightaway
Question Validation
Question
pfb the code ` import torch from transformers import BitsAndBytesConfig from llama_index.core.prompts import PromptTemplate from llama_index.llms.huggingface import HuggingFaceLLM
quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, )
llama2llm = HuggingFaceLLM( model_name="meta-llama/Llama-2-7b-chat-hf", tokenizer_name="meta-llama/Llama-2-7b-chat-hf", context_window=3900, model_kwargs={"quantization_config": quantization_config},
tokenizer_kwargs={"token": hf_token},
)
import pandas as pd from llama_index.core.query_engine import PandasQueryEngine
df = pd.DataFrame( { 'Item': ['water bottle', 'snack box', 'diapers'], 'Quantity': [2, 3, 4], 'Price': [30, 40, 50], 'Total Amount': [60, 120, 200] } ) query_engine = PandasQueryEngine(df=df, llm=llama2llm, verbose=True)
response = query_engine.query( "please give me the name of the item which has the least quantity and price ?", ) prompts = query_engine.get_prompts() print(prompts["pandas_prompt"].template) print(f"{response}")
`
Notes
i have tried Mistral-7B and llama2-7b ..all of them have serious issues with what i consider a critical component for enterprises. What most folks probably dont realize is that a TON of documents ( which have a lot of tabular information ) and spreadsheets present in shared folders of companies can ONLY be "LLM"ized by converting into a dataframe ( and bigger sets will need to go into the DB ) .. the sad truth is that this particular engine ONLY works with GPT and not the major open source LLM's.
I do understand that llama index has no responsibility whatsoever on this front but has someone benchmarked the major engines with something like llama2 ? not all of us would want to pump our data into paid LLMs and more importantly , in some cases, our smaller finetuned models are BETTER than GPT4 on narrow tasks. If there are any pointers , i would be really grateful. Here are some of the answers the above code gave me
PS - please forgive the formatting for the below
QUERY - give me the name of the item has the max qty?
df['Total Amount'].max()
returns the maximum value in the 'Total Amount' column of the dataframe.['Description']
returns the corresponding column name of the maximum value.Note: The
['Description']
part is optional, if you want to get the name of the column that corresponds to the maximum value, you can simply use['Description']
.*** wrong df query
QUERY - "which item has the lowest price?"
*** answers only a part of the query
QUERY - "please give me the name of the item which has the least quantity and price ?"
*** wrong df query