[Bug]: Service context is not used in some cases

jma7889 commented 1 year ago

Bug Description

I think this bug is critical because for at least for tree index users with custom service context. Wrong model and model settings are used. It impacts result cost and quality.

I use service_context to specify model gpt-3.5-turbo, I use tree index. It works fine when tree_child_branch_factor is 1. However, when tree_child_branch_factor is 2, about half of the requests to OpenAI API uses default "text-davinci-003", and temperature is also default, not the value in service context. But the other half of the requests are correctly use the settings from service context.

Version

llama-index-0.6.35

Steps to Reproduce

the code snippets:

...
        predictors = {
            ModelChoice.GPT35: LLMPredictor(llm=ChatOpenAI(temperature=0.01, model_name="gpt-3.5-turbo")),
...
        predictor = predictors[model_choice]
        chunk_size = chunk_sizes[model_choice]
        return ServiceContext.from_defaults(llm_predictor=predictor, chunk_size=chunk_size)
...
            reader = SimpleDirectoryReader(input_dir=self.input_dir)
            documents = reader.load_data()
            index = GPTTreeIndex.from_documents(documents, service_context=service_context)
...
            retriever = TreeSelectLeafRetriever(index=self.index, child_branch_factor=self.tree_child_branch_factor)
            self.query_engine = RetrieverQueryEngine(retriever=retriever)
...
        query_engine_manager.query_engine.query("78890 which model is it?")

Relevant Logs/Tracbacks

I use the debug log to confirm this, for example

DEBUG:openai:api_version=None data='{"prompt":  ... ... "model": "text-davinci-003", "temperature": 0.0, "max_tokens": 2367, "top_p": 1, "frequency_penalty": 0, "presence_penalty": 0, "n": 1, "logit_bias": {}}' message='Post details'

update 1: the issue is worse than initial writing. When tree_child_branch_factor is 1, the query calls the API 3 times, 2 used model spec in service context, but the last one used default text-davinci-003 settings.

jma7889 commented 1 year ago

There is a workaround for this bug to use set global context function to override default service context. see https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html

Disiok commented 1 year ago

Hey @jma7889 thanks for surfacing this! The tree index code needs some love for sure. But yes, we are nudging folks to use the global configuration option for now to mitigate some painpoints. Will try to improve this asap.

tyre commented 1 year ago

I have a proposed fix in #6911 .

RetrieverQueryEngine now supports a service_context option, which we want regardless, so that's an option. Since retriever is the only required parameter to the initializer, as @jma7889 is using it now, I've added a fallback for the response's service context to be that of the retriever. The one on the retriever can either be explicitly set or it will fall back to its index's. This will update existing integrations to work as is probably (?) expected.

@Disiok I'm only now seeing your comment so check out the PR in case we don't want to go down that road.

run-llama / llama_index