run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.23k stars 4.64k forks source link

[Question]: Response Synthesizer only works with OpenAi? #12153

Closed Aekansh-Ak closed 1 day ago

Aekansh-Ak commented 3 months ago

Question Validation

Question

I just imported response Synthesizer and tried to use it, I get this error.

my_synthesizer = get_response_synthesizer(response_mode="compact")


Could not load OpenAI model. Using default LlamaCPP=llama2-13b-chat. If you intended to use OpenAI, please check your OPENAI_API_KEY. Original error: No API key found for OpenAI. Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization. API keys can be found or created at https://platform.openai.com/account/api-keys


Downloading url https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf to path /tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf Error downloading model: [Errno 13] Permission denied: '/tmp/llama_index/models/llama-2-13b-chat.Q4_0.gguf' Download incomplete. Removing partially downloaded file. Traceback (most recent call last): File "/home/seceon/opensearch_lm_index/model.py", line 111, in response_synthesizer = get_response_synthesizer(

dosubot[bot] commented 3 months ago

Hey there, @Aekansh-Ak! 👋 Great to see you back with us. I remember our last adventure through the code. I'm diving into your issue now and will get back to you with some insights shortly. 🕵️‍♂️

logan-markewich commented 3 months ago

You need to either pass in the llm as a kwarg

get_response_synthesizer(..., llm=llm)

or change the global defaults

from llama_index.core import Settings
Settings.llm = llm
dosubot[bot] commented 3 months ago

Based on the issues you're facing, it seems like there are two main problems: the missing OpenAI API key and the permission denied error when downloading the model. Here's how you can address these:

  1. Missing API Key Error: This error is due to the OpenAI API key not being set. You can set this key in two ways:

    • Environment Variable: Set the OPENAI_API_KEY environment variable with your OpenAI API key. This is checked in the validate_openai_api_key function which looks for the API key in the environment variables if it's not passed as a parameter.
    • Direct Assignment: Before calling any functions that require OpenAI API access, directly assign your API key to openai.api_key. This can be done as follows:
      import openai
      openai.api_key = 'your_api_key_here'

      Replace 'your_api_key_here' with your actual OpenAI API key.

  2. Permission Denied Error When Downloading the Model: This issue is likely due to file system permissions. Ensure that the user running the script has the necessary permissions to download and save files in the intended directory. This might involve changing directory permissions or running the script as a user with the appropriate permissions. If the script is trying to save files in a directory that requires elevated permissions, consider saving to a different location or adjusting the permissions of the target directory.

Regarding the get_response_synthesizer function, it is a factory function that creates and returns instances of response synthesizer classes based on specific criteria or parameters provided to it. The response_mode parameter is used to select the appropriate synthesizer class. The possible values for this parameter are defined by the ResponseMode enum, and each value corresponds to a different synthesizer class. If the response_mode does not match any of the expected values, the function raises a ValueError indicating an unknown mode.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Relevant sources:

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
Aekansh-Ak commented 3 months ago

My llm is defined like this-:

pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens = 4000, streamer= streamer, torch_dtype=torch.float16, device_map="auto", top_p=0.9, temperature = 0.3, eos_token_id=tokenizer.eos_token_id )

llm = HuggingFacePipeline(pipeline=pipe)

response_synthesizer = get_response_synthesizer(llm=llm) TypeError: get_response_synthesizer() got an unexpected keyword argument 'llm'

logan-markewich commented 3 months ago

This is a langchain LLM class no? Need to wrap it with a langchain wrapper

https://docs.llamaindex.ai/en/stable/examples/llm/langchain.html

Also, I think you have an older version of llama-index? That's definitely a valid kwarg in newer versions

Aekansh-Ak commented 3 months ago

Upgraded the llama-index version, I am getting this error-:

File "/usr/local/lib/python3.10/site-packages/llama_index/core/indices/prompt_helper.py", line 148, in _get_available_context_size context_size_tokens = self.context_window - num_prompt_tokens - self.num_output TypeError: unsupported operand type(s) for -: 'tuple' and 'int'

Code-: ####################################################### client = OpensearchVectorClient( endpoint, idx, dim=1024, embedding_field=embedding_field, text_field=text_field, use_ssl = False, verify_certs = False )

vector_store = OpensearchVectorStore(client)

model_id = "lmsys/vicuna-13b-v1.5" model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto', torch_dtype=torch.float16 )

tokenizer = AutoTokenizer.from_pretrained(model_id) streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

embeddings = HuggingFaceInstructEmbeddings(model_name="intfloat/multilingual-e5-large", model_kwargs={"device": "cpu"})

text_split = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=200) Settings.text_splitter = LangchainNodeParser(text_split)

pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens = 4000, streamer= streamer, torch_dtype=torch.float16, device_map="auto", top_p=0.9, temperature = 0.3, eos_token_id=tokenizer.eos_token_id )

llm = HuggingFacePipeline(pipeline=pipe) llm = LangChainLLM(llm)

Settings.llm = llm Settings.embed_model = embeddings Settings.context_window=4096, Settings.num_output=256, Settings.chunk_overlap_ratio=0.1, Settings.chunk_size_limit=None

vsi = VectorStoreIndex.from_vector_store( vector_store = vector_store, )

query_engine = vsi.as_query_engine() res = query_engine.query("query")

Aekansh-Ak commented 3 months ago

Okay, I solved the error, this needs to be changed in prompt_helper.py-:

context_size_tokens = self.context_window[0] - num_prompt_tokens - self.num_output[0]

Will test the response synthesizer now