run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
37.13k stars 5.33k forks source link

[Feature Request]: Add GithubLLM #15169

Closed Josephrp closed 3 months ago

Josephrp commented 4 months ago

Feature Description

We propose adding a new GithubLLM class to LlamaIndex. This custom LLM interface would allow users to interact with AI models hosted on GitHub's inference endpoint, with automatic fallback to Azure when rate limits are reached. Key features include:

  1. Seamless integration with GitHub-hosted AI models
  2. Automatic fallback to Azure when GitHub rate limits are reached
  3. Support for both completion and chat-based interactions
  4. Streaming support for both completion and chat responses
  5. Easy integration with the existing LlamaIndex ecosystem

The implementation would be similar to other custom LLMs in LlamaIndex, inheriting from the CustomLLM class and implementing the necessary methods (complete, stream_complete, chat, stream_chat, etc.).

Reason

Currently, LlamaIndex does not have built-in support for GitHub's hosted AI models. Users who want to prototype with these models and potentially transition to Azure for production use don't have a straightforward way to do so within the LlamaIndex framework.

Value of Feature

Adding GithubLLM to LlamaIndex would provide several benefits:

  1. Prototyping: Developers can easily experiment with GitHub-hosted AI models for free, lowering the barrier to entry for AI development.
  2. Seamless Production Transition: The implementation allows for a smooth transition from prototyping to production by simply switching from a GitHub token to an Azure token.
  3. Rate Limit Management: Built-in rate limit handling and Azure fallback ensure that applications can continue functioning even when GitHub limits are reached.
  4. LlamaIndex Integration: Users can leverage all of LlamaIndex's powerful features while using GitHub-hosted models.
  5. Flexibility: Support for both completion and chat-based interactions, as well as streaming, provides flexibility for various use cases.

This feature would make LlamaIndex an even more comprehensive platform for AI development, from prototyping to production, and would align well with GitHub's efforts to provide accessible AI models to developers.

santiagxf commented 3 months ago

@Josephrp GitHub models are hosted in Azure. Use the LlamaIndex integration we have there. It should be supported: https://docs.llamaindex.ai/en/stable/examples/llm/azure_inference/. You need to pass the parameter model_name though. I will update the docs so it's properly displayed, but it's already supported.

Bringing a GitHubLLM would be redundant work since the underlying client is the same.

john0isaac commented 3 months ago

Following up on @santiagxf, I tried his suggestion here: https://github.com/leestott/azureai-x-arize/pull/1

Some functions work out of the box others error due to not specifying a model_name

Example for working functions without changing anything other than specifying the model name in the initialization:

Example for functions that do not work:

The error is the same for all, Traceback for error:

---------------------------------------------------------------------------
HttpResponseError                         Traceback (most recent call last)
Cell In[16], line 1
----> 1 summarize_query_engine = summary_index.as_query_engine(
      2     llm=llm,
      3     response_mode="tree_summarize",
      4     use_async=True,
      5 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/indices/base.py:411, in BaseIndex.as_query_engine(self, llm, **kwargs)
    404 retriever = self.as_retriever(**kwargs)
    405 llm = (
    406     resolve_llm(llm, callback_manager=self._callback_manager)
    407     if llm
    408     else llm_from_settings_or_context(Settings, self.service_context)
    409 )
--> 411 return RetrieverQueryEngine.from_args(
    412     retriever,
    413     llm=llm,
    414     **kwargs,
    415 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:110, in RetrieverQueryEngine.from_args(cls, retriever, llm, response_synthesizer, node_postprocessors, response_mode, text_qa_template, refine_template, summary_template, simple_template, output_cls, use_async, streaming, service_context, **kwargs)
     88 """Initialize a RetrieverQueryEngine object.".
     89 
     90 Args:
   (...)
    106 
    107 """
    108 llm = llm or llm_from_settings_or_context(Settings, service_context)
--> 110 response_synthesizer = response_synthesizer or get_response_synthesizer(
    111     llm=llm,
    112     service_context=service_context,
    113     text_qa_template=text_qa_template,
    114     refine_template=refine_template,
    115     summary_template=summary_template,
    116     simple_template=simple_template,
    117     response_mode=response_mode,
    118     output_cls=output_cls,
    119     use_async=use_async,
    120     streaming=streaming,
    121 )
    123 callback_manager = callback_manager_from_settings_or_context(
    124     Settings, service_context
    125 )
    127 return cls(
    128     retriever=retriever,
    129     response_synthesizer=response_synthesizer,
    130     callback_manager=callback_manager,
    131     node_postprocessors=node_postprocessors,
    132 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/factory.py:74, in get_response_synthesizer(llm, prompt_helper, service_context, text_qa_template, refine_template, summary_template, simple_template, response_mode, callback_manager, use_async, streaming, structured_answer_filtering, output_cls, program_factory, verbose)
     68     prompt_helper = service_context.prompt_helper
     69 else:
     70     prompt_helper = (
     71         prompt_helper
     72         or Settings._prompt_helper
     73         or PromptHelper.from_llm_metadata(
---> 74             llm.metadata,
     75         )
     76     )
     78 if response_mode == ResponseMode.REFINE:
     79     return Refine(
     80         llm=llm,
     81         callback_manager=callback_manager,
   (...)
     91         service_context=service_context,
     92     )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:282, in AzureAICompletionsModel.metadata(self)
    279 @property
    280 def metadata(self) -> LLMMetadata:
    281     if not self._model_name:
--> 282         model_info = self._client.get_model_info()
    283         if model_info:
    284             self._model_name = model_info.get("model_name", None)

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
     92 span_impl_type = settings.tracing_implementation()
     93 if span_impl_type is None:
---> 94     return func(*args, **kwargs)
     96 # Merge span is parameter is set, but only if no explicit parent are passed
     97 if merge_span and not passed_in_parent:

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_patch.py:660, in ChatCompletionsClient.get_model_info(self, **kwargs)
    653 """Returns information about the AI model.
    654 
    655 :return: ModelInfo. The ModelInfo is compatible with MutableMapping
    656 :rtype: ~azure.ai.inference.models.ModelInfo
    657 :raises ~azure.core.exceptions.HttpResponseError:
    658 """
    659 if not self._model_info:
--> 660     self._model_info = self._get_model_info(**kwargs)  # pylint: disable=attribute-defined-outside-init
    661 return self._model_info

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
     92 span_impl_type = settings.tracing_implementation()
     93 if span_impl_type is None:
---> 94     return func(*args, **kwargs)
     96 # Merge span is parameter is set, but only if no explicit parent are passed
     97 if merge_span and not passed_in_parent:

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_operations/_operations.py:558, in ChatCompletionsClientOperationsMixin._get_model_info(self, **kwargs)
    556         response.read()  # Load the body in memory and close the socket
    557     map_error(status_code=response.status_code, response=response, error_map=error_map)
--> 558     raise HttpResponseError(response=response)
    560 if _stream:
    561     deserialized = response.iter_bytes()

HttpResponseError: (no_model_name) No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Code: no_model_name
Message: No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.

I digged deeper in the codebase and found out that this is the reason for erroring:

You call get_model_info if the model_name is not set I did set the model_name and it works for some functions and others it doesn't, I'm suspecting that it might not be passed correctly but it seems to be a bug for now not a new feature that needs to be implemented.

llama_index/llms/azure_inference/base.py", line 282

    @property
    def metadata(self) -> LLMMetadata:
        if not self._model_name:
            model_info = self._client.get_model_info()
john0isaac commented 3 months ago

Workaround for now that seems to fix the previous errors, this is compatible with GitHub models,

@Josephrp someone needs to fix the init for the model_name

To create a chat model client use this code:

llm = AzureAICompletionsModel(
    endpoint=os.environ["AZURE_AI_ENDPOINT_URL"],
    credential=os.environ["AZURE_AI_ENDPOINT_KEY"],
    model_name=os.environ["AZURE_AI_MODEL_NAME"],
)
llm._model_name = os.environ["AZURE_AI_MODEL_NAME"]   # This is the fix

To create an embedding model client use this code: This seems to be affecting the chat model only.

santiagxf commented 3 months ago

Hi @john0isaac! Thanks for helping with the debug! I can get a PR and fix it.

john0isaac commented 3 months ago

You are most welcome @santiagxf, I was already working on a branch for your sample, PR is now ready at your repo. If you want fix this at llama index please feel free to do so!

santiagxf commented 3 months ago

@john0isaac can you verify if you are using the latest version of the library? Because I checked and I happen to added a test for this case and I see it passing.

https://github.com/santiagxf/llama_index/blob/729d5f2e76cca29284157c6fd8b55fb9953739dd/llama-index-integrations/llms/llama-index-llms-azure-inference/tests/test_llms_azure_inference.py#L19

john0isaac commented 3 months ago

@john0isaac can you verify if you are using the latest version of the library? Because I checked and I happen to added a test for this case and I see it passing.

https://github.com/santiagxf/llama_index/blob/729d5f2e76cca29284157c6fd8b55fb9953739dd/llama-index-integrations/llms/llama-index-llms-azure-inference/tests/test_llms_azure_inference.py#L19

I am using the latest version, It seems weird that some functions work and others don't.

I do know now why some work and other's don't if a function requires to check the metadata property it's gonna trigger the broken code and error if a function doesn't require to read the value of the metadata property it will work as expected hence the flaky behavior.

See the ones I listed up there, they read the metadata.

Btw it's starting to make sense to me there are two attributes in the AzureAI Class one called model_name and another called _model_name and you never assign the init model_name to _model_name

And in the metadata you are checking for the _model_name not the model_name So it makes sense to error as it was never set in the init

santiagxf commented 3 months ago

Good point. That's the fix then. I need to wrap that on a try/except in case the endpoint doesn't support metadata retrieval. We will bring support soon to the GH endpoint though. I'll follow up by tomorrow on it.

santiagxf commented 3 months ago

@john0isaac can you help me validate if the following branch solves the issue?

https://github.com/santiagxf/llama_index/tree/santiagxf/azure-ai-inference-gh

You can install with pip install git+https://github.com/santiagxf/llama_index.git@santiagxf/azure-ai-inference-gh#subdirectory=llama-index-integrations/llms/llama-index-llms-azure-inference

john0isaac commented 3 months ago

sure, I'm not sure if you are doing this intentionally or not. You never pass in the initializer model_name to the private _model_name

if you just pass the value in the init it's gonna solve the issue. But sure I will test what you did.

john0isaac commented 3 months ago

Using your branch, I receive these warning which I was receiving when I didn't follow my workaround:

WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f', event.id_=UUID('15b072f0-94b7-4ae9-90da-63a3a903cc3c')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea', event.id_=UUID('048349d3-b65f-4b94-bbc5-7c821ac53a41')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea', event.id_=UUID('bb31a2eb-2842-4c0f-aa25-330193571b4c')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for id_='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea'
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f', event.id_=UUID('ff6c9138-aced-4057-9956-4dc22752ee57')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for id_='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f'

And I get these new errors resulting from the code that you changed:

{
    "name": "AttributeError",
    "message": "'ChatCompletionsClient' object has no attribute 'endpoint'",
    "stack": "---------------------------------------------------------------------------
HttpResponseError                         Traceback (most recent call last)
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:288, in AzureAICompletionsModel.metadata(self)
    285 try:
    286     # Get model info from the endpoint. This method may not be supported by all
    287     # endpoints.
--> 288     model_info = self._client.get_model_info()
    289 except Exception:

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
     93 if span_impl_type is None:
---> 94     return func(*args, **kwargs)
     96 # Merge span is parameter is set, but only if no explicit parent are passed

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_patch.py:660, in ChatCompletionsClient.get_model_info(self, **kwargs)
    659 if not self._model_info:
--> 660     self._model_info = self._get_model_info(**kwargs)  # pylint: disable=attribute-defined-outside-init
    661 return self._model_info

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
     93 if span_impl_type is None:
---> 94     return func(*args, **kwargs)
     96 # Merge span is parameter is set, but only if no explicit parent are passed

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_operations/_operations.py:558, in ChatCompletionsClientOperationsMixin._get_model_info(self, **kwargs)
    557     map_error(status_code=response.status_code, response=response, error_map=error_map)
--> 558     raise HttpResponseError(response=response)
    560 if _stream:

HttpResponseError: (no_model_name) No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Code: no_model_name
Message: No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[17], line 1
----> 1 summarize_query_engine = summary_index.as_query_engine(
      2     llm=llm,
      3     response_mode=\"tree_summarize\",
      4     use_async=True,
      5 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/indices/base.py:411, in BaseIndex.as_query_engine(self, llm, **kwargs)
    404 retriever = self.as_retriever(**kwargs)
    405 llm = (
    406     resolve_llm(llm, callback_manager=self._callback_manager)
    407     if llm
    408     else llm_from_settings_or_context(Settings, self.service_context)
    409 )
--> 411 return RetrieverQueryEngine.from_args(
    412     retriever,
    413     llm=llm,
    414     **kwargs,
    415 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:110, in RetrieverQueryEngine.from_args(cls, retriever, llm, response_synthesizer, node_postprocessors, response_mode, text_qa_template, refine_template, summary_template, simple_template, output_cls, use_async, streaming, service_context, **kwargs)
     88 \"\"\"Initialize a RetrieverQueryEngine object.\".
     89 
     90 Args:
   (...)
    106 
    107 \"\"\"
    108 llm = llm or llm_from_settings_or_context(Settings, service_context)
--> 110 response_synthesizer = response_synthesizer or get_response_synthesizer(
    111     llm=llm,
    112     service_context=service_context,
    113     text_qa_template=text_qa_template,
    114     refine_template=refine_template,
    115     summary_template=summary_template,
    116     simple_template=simple_template,
    117     response_mode=response_mode,
    118     output_cls=output_cls,
    119     use_async=use_async,
    120     streaming=streaming,
    121 )
    123 callback_manager = callback_manager_from_settings_or_context(
    124     Settings, service_context
    125 )
    127 return cls(
    128     retriever=retriever,
    129     response_synthesizer=response_synthesizer,
    130     callback_manager=callback_manager,
    131     node_postprocessors=node_postprocessors,
    132 )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/factory.py:74, in get_response_synthesizer(llm, prompt_helper, service_context, text_qa_template, refine_template, summary_template, simple_template, response_mode, callback_manager, use_async, streaming, structured_answer_filtering, output_cls, program_factory, verbose)
     68     prompt_helper = service_context.prompt_helper
     69 else:
     70     prompt_helper = (
     71         prompt_helper
     72         or Settings._prompt_helper
     73         or PromptHelper.from_llm_metadata(
---> 74             llm.metadata,
     75         )
     76     )
     78 if response_mode == ResponseMode.REFINE:
     79     return Refine(
     80         llm=llm,
     81         callback_manager=callback_manager,
   (...)
     91         service_context=service_context,
     92     )

File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:291, in AzureAICompletionsModel.metadata(self)
    288     model_info = self._client.get_model_info()
    289 except Exception:
    290     logger.warning(
--> 291         f\"Endpoint '{self._client.endpoint}' does support model metadata retrieval. \"
    292         \"Failed to get model info for method `metadata()`.\"
    293     )
    294     self._model_name = \"unknown\"
    295     self._model_provider = \"unknown\"

AttributeError: 'ChatCompletionsClient' object has no attribute 'endpoint'"
}
Josephrp commented 3 months ago

missed the action i'll catch up on your branch i hope

santiagxf commented 3 months ago

@Josephrp @john0isaac fixed the error. Can you please take a look again?

santiagxf commented 3 months ago

This issue has been fixed in the following PR: https://github.com/run-llama/llama_index/pull/15747. Please update to llama-index-llms-azure-inference>=0.2.2