Closed Josephrp closed 3 months ago
@Josephrp GitHub models are hosted in Azure. Use the LlamaIndex integration we have there. It should be supported: https://docs.llamaindex.ai/en/stable/examples/llm/azure_inference/. You need to pass the parameter model_name
though. I will update the docs so it's properly displayed, but it's already supported.
Bringing a GitHubLLM would be redundant work since the underlying client is the same.
Following up on @santiagxf, I tried his suggestion here: https://github.com/leestott/azureai-x-arize/pull/1
Some functions work out of the box others error due to not specifying a model_name
Example for working functions without changing anything other than specifying the model name in the initialization:
AzureAICompletionsModel.complete
AzureAICompletionsModel.stream_complete
Example for functions that do not work:
RagDatasetGenerator.generate_questions_from_nodes
SummaryIndex.as_query_engine
The error is the same for all, Traceback for error:
---------------------------------------------------------------------------
HttpResponseError Traceback (most recent call last)
Cell In[16], line 1
----> 1 summarize_query_engine = summary_index.as_query_engine(
2 llm=llm,
3 response_mode="tree_summarize",
4 use_async=True,
5 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/indices/base.py:411, in BaseIndex.as_query_engine(self, llm, **kwargs)
404 retriever = self.as_retriever(**kwargs)
405 llm = (
406 resolve_llm(llm, callback_manager=self._callback_manager)
407 if llm
408 else llm_from_settings_or_context(Settings, self.service_context)
409 )
--> 411 return RetrieverQueryEngine.from_args(
412 retriever,
413 llm=llm,
414 **kwargs,
415 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:110, in RetrieverQueryEngine.from_args(cls, retriever, llm, response_synthesizer, node_postprocessors, response_mode, text_qa_template, refine_template, summary_template, simple_template, output_cls, use_async, streaming, service_context, **kwargs)
88 """Initialize a RetrieverQueryEngine object.".
89
90 Args:
(...)
106
107 """
108 llm = llm or llm_from_settings_or_context(Settings, service_context)
--> 110 response_synthesizer = response_synthesizer or get_response_synthesizer(
111 llm=llm,
112 service_context=service_context,
113 text_qa_template=text_qa_template,
114 refine_template=refine_template,
115 summary_template=summary_template,
116 simple_template=simple_template,
117 response_mode=response_mode,
118 output_cls=output_cls,
119 use_async=use_async,
120 streaming=streaming,
121 )
123 callback_manager = callback_manager_from_settings_or_context(
124 Settings, service_context
125 )
127 return cls(
128 retriever=retriever,
129 response_synthesizer=response_synthesizer,
130 callback_manager=callback_manager,
131 node_postprocessors=node_postprocessors,
132 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/factory.py:74, in get_response_synthesizer(llm, prompt_helper, service_context, text_qa_template, refine_template, summary_template, simple_template, response_mode, callback_manager, use_async, streaming, structured_answer_filtering, output_cls, program_factory, verbose)
68 prompt_helper = service_context.prompt_helper
69 else:
70 prompt_helper = (
71 prompt_helper
72 or Settings._prompt_helper
73 or PromptHelper.from_llm_metadata(
---> 74 llm.metadata,
75 )
76 )
78 if response_mode == ResponseMode.REFINE:
79 return Refine(
80 llm=llm,
81 callback_manager=callback_manager,
(...)
91 service_context=service_context,
92 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:282, in AzureAICompletionsModel.metadata(self)
279 @property
280 def metadata(self) -> LLMMetadata:
281 if not self._model_name:
--> 282 model_info = self._client.get_model_info()
283 if model_info:
284 self._model_name = model_info.get("model_name", None)
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
92 span_impl_type = settings.tracing_implementation()
93 if span_impl_type is None:
---> 94 return func(*args, **kwargs)
96 # Merge span is parameter is set, but only if no explicit parent are passed
97 if merge_span and not passed_in_parent:
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_patch.py:660, in ChatCompletionsClient.get_model_info(self, **kwargs)
653 """Returns information about the AI model.
654
655 :return: ModelInfo. The ModelInfo is compatible with MutableMapping
656 :rtype: ~azure.ai.inference.models.ModelInfo
657 :raises ~azure.core.exceptions.HttpResponseError:
658 """
659 if not self._model_info:
--> 660 self._model_info = self._get_model_info(**kwargs) # pylint: disable=attribute-defined-outside-init
661 return self._model_info
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
92 span_impl_type = settings.tracing_implementation()
93 if span_impl_type is None:
---> 94 return func(*args, **kwargs)
96 # Merge span is parameter is set, but only if no explicit parent are passed
97 if merge_span and not passed_in_parent:
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_operations/_operations.py:558, in ChatCompletionsClientOperationsMixin._get_model_info(self, **kwargs)
556 response.read() # Load the body in memory and close the socket
557 map_error(status_code=response.status_code, response=response, error_map=error_map)
--> 558 raise HttpResponseError(response=response)
560 if _stream:
561 deserialized = response.iter_bytes()
HttpResponseError: (no_model_name) No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Code: no_model_name
Message: No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
I digged deeper in the codebase and found out that this is the reason for erroring:
You call get_model_info
if the model_name is not set I did set the model_name and it works for some functions and others it doesn't, I'm suspecting that it might not be passed correctly but it seems to be a bug for now not a new feature that needs to be implemented.
llama_index/llms/azure_inference/base.py", line 282
@property
def metadata(self) -> LLMMetadata:
if not self._model_name:
model_info = self._client.get_model_info()
Workaround for now that seems to fix the previous errors, this is compatible with GitHub models,
@Josephrp someone needs to fix the init for the model_name
To create a chat model client use this code:
llm = AzureAICompletionsModel(
endpoint=os.environ["AZURE_AI_ENDPOINT_URL"],
credential=os.environ["AZURE_AI_ENDPOINT_KEY"],
model_name=os.environ["AZURE_AI_MODEL_NAME"],
)
llm._model_name = os.environ["AZURE_AI_MODEL_NAME"] # This is the fix
To create an embedding model client use this code:
This seems to be affecting the chat model only.
Hi @john0isaac! Thanks for helping with the debug! I can get a PR and fix it.
You are most welcome @santiagxf, I was already working on a branch for your sample, PR is now ready at your repo. If you want fix this at llama index please feel free to do so!
@john0isaac can you verify if you are using the latest version of the library? Because I checked and I happen to added a test for this case and I see it passing.
@john0isaac can you verify if you are using the latest version of the library? Because I checked and I happen to added a test for this case and I see it passing.
I am using the latest version, It seems weird that some functions work and others don't.
I do know now why some work and other's don't if a function requires to check the metadata property it's gonna trigger the broken code and error if a function doesn't require to read the value of the metadata property it will work as expected hence the flaky behavior.
See the ones I listed up there, they read the metadata.
Btw it's starting to make sense to me there are two attributes in the AzureAI Class one called model_name and another called _model_name and you never assign the init model_name to _model_name
And in the metadata you are checking for the _model_name not the model_name So it makes sense to error as it was never set in the init
Good point. That's the fix then. I need to wrap that on a try/except in case the endpoint doesn't support metadata retrieval. We will bring support soon to the GH endpoint though. I'll follow up by tomorrow on it.
@john0isaac can you help me validate if the following branch solves the issue?
https://github.com/santiagxf/llama_index/tree/santiagxf/azure-ai-inference-gh
You can install with pip install git+https://github.com/santiagxf/llama_index.git@santiagxf/azure-ai-inference-gh#subdirectory=llama-index-integrations/llms/llama-index-llms-azure-inference
sure, I'm not sure if you are doing this intentionally or not.
You never pass in the initializer model_name
to the private _model_name
if you just pass the value in the init it's gonna solve the issue. But sure I will test what you did.
Using your branch, I receive these warning which I was receiving when I didn't follow my workaround:
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f', event.id_=UUID('15b072f0-94b7-4ae9-90da-63a3a903cc3c')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea', event.id_=UUID('048349d3-b65f-4b94-bbc5-7c821ac53a41')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea', event.id_=UUID('bb31a2eb-2842-4c0f-aa25-330193571b4c')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for id_='AzureAICompletionsModel.chat-691b82de-3e49-4084-83d2-b0d220b2fcea'
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for event.span_id='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f', event.id_=UUID('ff6c9138-aced-4057-9956-4dc22752ee57')
WARNI [openinference.instrumentation.llama_index._handler] Open span is missing for id_='AzureAICompletionsModel.complete-2c97c998-908b-494a-b91f-ddbce575792f'
And I get these new errors resulting from the code that you changed:
{
"name": "AttributeError",
"message": "'ChatCompletionsClient' object has no attribute 'endpoint'",
"stack": "---------------------------------------------------------------------------
HttpResponseError Traceback (most recent call last)
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:288, in AzureAICompletionsModel.metadata(self)
285 try:
286 # Get model info from the endpoint. This method may not be supported by all
287 # endpoints.
--> 288 model_info = self._client.get_model_info()
289 except Exception:
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
93 if span_impl_type is None:
---> 94 return func(*args, **kwargs)
96 # Merge span is parameter is set, but only if no explicit parent are passed
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_patch.py:660, in ChatCompletionsClient.get_model_info(self, **kwargs)
659 if not self._model_info:
--> 660 self._model_info = self._get_model_info(**kwargs) # pylint: disable=attribute-defined-outside-init
661 return self._model_info
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/core/tracing/decorator.py:94, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
93 if span_impl_type is None:
---> 94 return func(*args, **kwargs)
96 # Merge span is parameter is set, but only if no explicit parent are passed
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/azure/ai/inference/_operations/_operations.py:558, in ChatCompletionsClientOperationsMixin._get_model_info(self, **kwargs)
557 map_error(status_code=response.status_code, response=response, error_map=error_map)
--> 558 raise HttpResponseError(response=response)
560 if _stream:
HttpResponseError: (no_model_name) No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
Code: no_model_name
Message: No model specified in request. Please provide a model name in the request body or as a x-ms-model-mesh-model-name header.
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
Cell In[17], line 1
----> 1 summarize_query_engine = summary_index.as_query_engine(
2 llm=llm,
3 response_mode=\"tree_summarize\",
4 use_async=True,
5 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/indices/base.py:411, in BaseIndex.as_query_engine(self, llm, **kwargs)
404 retriever = self.as_retriever(**kwargs)
405 llm = (
406 resolve_llm(llm, callback_manager=self._callback_manager)
407 if llm
408 else llm_from_settings_or_context(Settings, self.service_context)
409 )
--> 411 return RetrieverQueryEngine.from_args(
412 retriever,
413 llm=llm,
414 **kwargs,
415 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/query_engine/retriever_query_engine.py:110, in RetrieverQueryEngine.from_args(cls, retriever, llm, response_synthesizer, node_postprocessors, response_mode, text_qa_template, refine_template, summary_template, simple_template, output_cls, use_async, streaming, service_context, **kwargs)
88 \"\"\"Initialize a RetrieverQueryEngine object.\".
89
90 Args:
(...)
106
107 \"\"\"
108 llm = llm or llm_from_settings_or_context(Settings, service_context)
--> 110 response_synthesizer = response_synthesizer or get_response_synthesizer(
111 llm=llm,
112 service_context=service_context,
113 text_qa_template=text_qa_template,
114 refine_template=refine_template,
115 summary_template=summary_template,
116 simple_template=simple_template,
117 response_mode=response_mode,
118 output_cls=output_cls,
119 use_async=use_async,
120 streaming=streaming,
121 )
123 callback_manager = callback_manager_from_settings_or_context(
124 Settings, service_context
125 )
127 return cls(
128 retriever=retriever,
129 response_synthesizer=response_synthesizer,
130 callback_manager=callback_manager,
131 node_postprocessors=node_postprocessors,
132 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/core/response_synthesizers/factory.py:74, in get_response_synthesizer(llm, prompt_helper, service_context, text_qa_template, refine_template, summary_template, simple_template, response_mode, callback_manager, use_async, streaming, structured_answer_filtering, output_cls, program_factory, verbose)
68 prompt_helper = service_context.prompt_helper
69 else:
70 prompt_helper = (
71 prompt_helper
72 or Settings._prompt_helper
73 or PromptHelper.from_llm_metadata(
---> 74 llm.metadata,
75 )
76 )
78 if response_mode == ResponseMode.REFINE:
79 return Refine(
80 llm=llm,
81 callback_manager=callback_manager,
(...)
91 service_context=service_context,
92 )
File ~/Developer/azureai-x-arize/.venv/lib/python3.10/site-packages/llama_index/llms/azure_inference/base.py:291, in AzureAICompletionsModel.metadata(self)
288 model_info = self._client.get_model_info()
289 except Exception:
290 logger.warning(
--> 291 f\"Endpoint '{self._client.endpoint}' does support model metadata retrieval. \"
292 \"Failed to get model info for method `metadata()`.\"
293 )
294 self._model_name = \"unknown\"
295 self._model_provider = \"unknown\"
AttributeError: 'ChatCompletionsClient' object has no attribute 'endpoint'"
}
missed the action i'll catch up on your branch i hope
@Josephrp @john0isaac fixed the error. Can you please take a look again?
This issue has been fixed in the following PR: https://github.com/run-llama/llama_index/pull/15747. Please update to llama-index-llms-azure-inference>=0.2.2
Feature Description
We propose adding a new
GithubLLM
class to LlamaIndex. This custom LLM interface would allow users to interact with AI models hosted on GitHub's inference endpoint, with automatic fallback to Azure when rate limits are reached. Key features include:The implementation would be similar to other custom LLMs in LlamaIndex, inheriting from the
CustomLLM
class and implementing the necessary methods (complete
,stream_complete
,chat
,stream_chat
, etc.).Reason
Currently, LlamaIndex does not have built-in support for GitHub's hosted AI models. Users who want to prototype with these models and potentially transition to Azure for production use don't have a straightforward way to do so within the LlamaIndex framework.
Value of Feature
Adding GithubLLM to LlamaIndex would provide several benefits:
This feature would make LlamaIndex an even more comprehensive platform for AI development, from prototyping to production, and would align well with GitHub's efforts to provide accessible AI models to developers.