run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.63k stars 5.24k forks source link

[Question]: Is there a way we can use select OpenAI Key dynamically? #7377

Closed rendyfebry closed 1 year ago

rendyfebry commented 1 year ago

Question Validation

Question

As per documentation, the only possible way to inject OpenAI key is by using the environment variable OPENAI_API_KEY.

If let's say I have multiple API Kesy and want to change between them on the fly, what's the best way to do that?

dosubot[bot] commented 1 year ago

🤖

Hello,

Yes, there is a way to dynamically switch between multiple OpenAI API keys on the fly. The LlamaIndex framework allows for this by passing the api_key parameter during the initialization of an OpenAI object. Here's an example:

from llama_index.llms.openai import OpenAI

# Initialize with the first API key
openai_obj1 = OpenAI(api_key='sk-yourfirstapikey')

# Use the OpenAI object
# ...

# Initialize with the second API key
openai_obj2 = OpenAI(api_key='sk-yoursecondapikey')

# Use the second OpenAI object
# ...

This method allows you to switch API keys dynamically within your code. However, please be aware that this method is not thread-safe. If you are using multiple threads, you should ensure that the API key is set correctly in each thread.

Additionally, the LlamaIndex framework does support the initialization of multiple OpenAI objects with different API keys simultaneously. This is evident from the __init__ method of the OpenAIEmbedding class. The api_key parameter is optional and if provided, it is validated and then added to the openai_kwargs dictionary which is used for the OpenAI API. This means that you can create multiple instances of the OpenAIEmbedding class, each with a different API key.

Here is the relevant code:

def __init__(
    self,
    mode: str = OpenAIEmbeddingMode.TEXT_SEARCH_MODE,
    model: str = OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002,
    deployment_name: Optional[str] = None,
    embed_batch_size: int = DEFAULT_EMBED_BATCH_SIZE,
    callback_manager: Optional[CallbackManager] = None,
    **kwargs: Any,
) -> None:
    validate_openai_api_key(
        kwargs.get("api_key", None), kwargs.get("api_type", None)
    )

    self._query_engine = get_engine(mode, model, _QUERY_MODE_MODEL_DICT)
    self._text_engine = get_engine(mode, model, _TEXT_MODE_MODEL_DICT)

    super().__init__(
        embed_batch_size=embed_batch_size,
        callback_manager=callback_manager,
        model_name=model,
        deployment_name=deployment_name,
        openai_kwargs=kwargs,
    )

In this code, api_key is an optional parameter for the OpenAIEmbedding class's constructor. If an API key is provided when an OpenAIEmbedding object is instantiated, it is added to the openai_kwargs dictionary, which is then passed to the superclass's constructor. This allows for the creation of multiple OpenAIEmbedding objects, each with a different API key.

I hope this helps! Let me know if you have any other questions.

Best, Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

rendyfebry commented 1 year ago

Cool, thank you

rendyfebry commented 1 year ago

Actually, it doesn't work

logan-markewich commented 1 year ago

@rendyfebry What's the issue? Just tested locally myself, seems to work fine

image

logan-markewich commented 1 year ago

A similar thing also works for embeddings

image

rendyfebry commented 1 year ago

Hi @logan-markewich

Thanks for testing that. When I do it standalone like that, it seems to work, but when I use that with the rest of the system it doesn't work.

llm_predictor = LLMPredictor(llm=OpenAI(temperature=self.__llm_temperature,
                                        model=self.__llm_model_name,
                                        max_tokens=self.__llm_num_outputs,
                                        api_key="sk-..."))
service_ctx = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                           node_parser=node_parser,
                                           callback_manager=self.__get_llm_callback_manager())

index = VectorStoreIndex([], service_context=service_ctx, storage_context=storage_ctx)
rendyfebry commented 1 year ago

Found one problem here.

When ServiceContext initializes without passing embed_model, it will call resolve_embed_model which will try to use OpenAIEmbedding() without any parameter. Therefore the custom api_key won't be respected and fallback to OPENAI_API_KEY

https://github.com/jerryjliu/llama_index/blob/main/llama_index/embeddings/utils.py#L52

For now, I will pass the embed_model to ServiceContext, but would be great if we could pass the api_key down the line automatically.

logan-markewich commented 1 year ago

Since the embed model and llm are completely separate abstractions, it's a bit tricky to nicely pass the openai key to both of them under the hood, since that is explicit to openai?

For now, I think having to initialize both the embed model and llm is a fair compromise for this use case :)