amygbAI commented 5 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

pfb the code ` import torch from transformers import BitsAndBytesConfig from llama_index.core.prompts import PromptTemplate from llama_index.llms.huggingface import HuggingFaceLLM

quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, )

llama2llm = HuggingFaceLLM( model_name="meta-llama/Llama-2-7b-chat-hf", tokenizer_name="meta-llama/Llama-2-7b-chat-hf", context_window=3900, model_kwargs={"quantization_config": quantization_config},

tokenizer_kwargs={"token": hf_token},

device_map="auto",

)

import pandas as pd from llama_index.core.query_engine import PandasQueryEngine

df = pd.DataFrame( { 'Item': ['water bottle', 'snack box', 'diapers'], 'Quantity': [2, 3, 4], 'Price': [30, 40, 50], 'Total Amount': [60, 120, 200] } ) query_engine = PandasQueryEngine(df=df, llm=llama2llm, verbose=True)

response = query_engine.query( "please give me the name of the item which has the least quantity and price ?", ) prompts = query_engine.get_prompts() print(prompts["pandas_prompt"].template) print(f"{response}")

`

Notes

i have tried Mistral-7B and llama2-7b ..all of them have serious issues with what i consider a critical component for enterprises. What most folks probably dont realize is that a TON of documents ( which have a lot of tabular information ) and spreadsheets present in shared folders of companies can ONLY be "LLM"ized by converting into a dataframe ( and bigger sets will need to go into the DB ) .. the sad truth is that this particular engine ONLY works with GPT and not the major open source LLM's.

I do understand that llama index has no responsibility whatsoever on this front but has someone benchmarked the major engines with something like llama2 ? not all of us would want to pump our data into paid LLMs and more importantly , in some cases, our smaller finetuned models are BETTER than GPT4 on narrow tasks. If there are any pointers , i would be really grateful. Here are some of the answers the above code gave me

PS - please forgive the formatting for the below

QUERY - give me the name of the item has the max qty?

Pandas Instructions: df['Total Amount'].max()['Description'] Explanation:

df['Total Amount'].max() returns the maximum value in the 'Total Amount' column of the dataframe.
['Description'] returns the corresponding column name of the maximum value.

Note: The ['Description'] part is optional, if you want to get the name of the column that corresponds to the maximum value, you can simply use ['Description'].

*** wrong df query

QUERY - "which item has the lowest price?"

Pandas Instructions: print(df['Price'].min()) 30 Pandas Output: None

*** answers only a part of the query

QUERY - "please give me the name of the item which has the least quantity and price ?"

Pandas Instructions: least_item = df['Quantity'].min()[df['Price'].min()] print(least_item)

*** wrong df query

dosubot[bot] commented 5 months ago

It seems you're encountering issues with the PandasQueryEngine when using it with open-source LLMs like Mistral-7B and Llama-2-7b, particularly with handling complex queries involving tabular data. The challenges you've outlined, such as incorrect query handling and partial answers, might stem from the inherent limitations of the PandasQueryEngine, including its ability to process complex conditions, its security measures, and its flexibility in handling different types of queries.

To address the specific issues you've mentioned:

The incorrect pandas instruction for the query about the maximum quantity item.
The incomplete response to the query about the lowest price item.
The incorrect pandas instruction for the query about the item with the least quantity and price.

These problems suggest a need for a more robust handling mechanism within the PandasQueryEngine for complex queries, especially those requiring multiple conditions or aggregations.

Unfortunately, there doesn't seem to be benchmarking information available for the LlamaIndex framework with different LLMs, including open-source models. This makes it challenging to directly address compatibility or performance issues with specific LLMs like Llama-2-7b.

To better assist you, it would be helpful to have the exact error messages or incorrect outputs you're encountering. Additionally, more details about your specific use case, including the type of data, queries, and any particular performance or security considerations, would enable us to suggest more targeted solutions or workarounds.

For further reference, you can review the test cases for the PandasQueryEngine and the source code.

Please let me know if you need further clarification or have additional questions.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

amygbAI commented 5 months ago

the above was a terrible summary by the bot .. i know for a fact that the problem is NOT with the query engine ..its the LLM's which keep throwing random stuff the engine's way ( coz the engine is simply creating custom templates for different tasks and handing it over to the LLM )

logan-markewich commented 5 months ago

@amygbAI open-source LLMs are, in general, terrible with structured outputs. Add complex reasoning on top of that, and its not going to be a fun time.

You mentioned Llama2, which is my experience is one of the worst offenders.

Newer models are maybe getting better (command-r, codellama, etc.), but not really reliable for this task without finetuning

logan-markewich commented 5 months ago

Anthropic, gemini, openai, mistral-large, etc. all work pretty well though. Of course, those are hosted models

amygbAI commented 5 months ago

Thank you for your response 😊..agreed ..just that getting any URL whitelisted ( for access to hosted models ) via any reasonably sized company's infosec team is a bureaucratic nightmare ..I'm hoping llama3 release might help ..if I do manage to fine-tune this for llama2 though, I shall add details to this thread

On Wed, Apr 10, 2024, 10:52 PM Logan @.***> wrote:

Anthropic, gemini, openai, mistral-large, etc. all work pretty well though. Of course, those are hosted models

— Reply to this email directly, view it on GitHub https://github.com/run-llama/llama_index/issues/12692#issuecomment-2048094800, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAEMO7XBMSYOWIO4JLY4VYMZAVCNFSM6AAAAABF72Q3FCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYGA4TIOBQGA . You are receiving this because you were mentioned.Message ID: @.***>

uzairlatif90 commented 4 months ago

Hello @amygbAI ! I am currently dealing with the same problem. I also found this langchain function but it has the same problem that it only works with GPT LLMs. I am thinking of using of the Llama 3 LLM locally. Were you able to find a solution to your problems?

amygbAI commented 4 months ago

Hi @uzairlatif90 i did exactly what you are/were thinking of doing. Just deployed Llama3 ( instruct-8B ) ..now im working on quantizing it via llama cpp ..but llama3 is still a ways behind gpt4 so i am building a scaffolding framework around double checking its output. Basically have a set of pre defined query responses and see how the output compares. In my case it was ~80-85% accurate.

uzairlatif90 commented 4 months ago

Thank you so much for your reply @amygbAI ! Would you be able to share a snippet from your code on how exactly you deployed the Llama3 (instruct-8B) to analyse the dataframe with llamacpp. I have been struggling with that and I would be really grateful for your help.

amygbAI commented 4 months ago

@uzairlatif90 the llama cpp thing is a WIP :) ..like i mentioned in my comment, i am still working on it .. once im able to get it up and running i will most certainly share it here

amygbAI commented 4 months ago

@uzairlatif90 it was really simple in the end .. install ollama https://github.com/ollama/ollama/tree/main and ollama pull llama3:instruct and start their server using ollama serve ..ideally it starts by default ..then you can create your own API using a simple python requests like so

url = "http://localhost:11434/api/generate"
data = {
        "model": "llama3:instruct",
    "prompt": '''In the following code below extract the global variables and method definitions. Restrict answer to 2 lists. One with list of variables and the other with list of method definitions.\n govt_id_sync_dict = dict({"drivinglicence" : "DL",
                          "voterid" : "Voter ID",
                          "passport" : "Passport",
                          "aadhaar" : "Aadhaar",
                          "pan" : "PAN"})

def get_num_pages(pdf_path):
    # Get num of pages '''
}

the only thing is the response comes out in the form of individual tokens so you will need to process it but otherwise, quality wise there's a small degradation ( since this is a 4.7 GB quantized version of the 8B instruct model ) BUT still much much better than llama2 and almost as good as gpt3.5 once u have this wrapper, instead of calling the pandasqueryengine you can call this straightaway

run-llama / llama_index

[Question]: PandasQueryEngine with open source LLM's #12692

Question Validation

Question

tokenizer_kwargs={"token": hf_token},

Notes

Sources