Open weissenbacherpwc opened 1 month ago
Hey @weissenbacherpwc, I've opened a PR.
Could you try my branch out and let me know if it fixes the issue?
pip install "git+https://github.com/langchain-ai/langchain.git@jacob/azure#subdirectory=libs/community"
Hi @jacoblee93 I tried out installing your branch. It works now that the response is returned as AIMessage instead of BaseMessage. However when using it in a LCEL or LLMChain, the same error as described occurs.
I tried it with AzureMLOnlineEndpoint
and AzureMLChatOnlineEndpoint
without success.
It looks like there's an exported MistralChatContentFormatter
- could you try instantiating and passing in that one?
tried out thanks! However it is still not working, here is the code:
from langchain_community.chat_models.azureml_endpoint import AzureMLChatOnlineEndpoint
from langchain_community.llms.azureml_endpoint import ContentFormatterBase
from langchain_community.chat_models.azureml_endpoint import (
AzureMLEndpointApiType,
CustomOpenAIChatContentFormatter,
MistralChatContentFormatter
)
from langchain_core.messages import HumanMessage
chat = AzureMLChatOnlineEndpoint(
endpoint_url="https://llm-host-westeurope-oqelx.westeurope.inference.ml.azure.com/score",
endpoint_api_type=AzureMLEndpointApiType.dedicated,
endpoint_api_key="",
content_formatter=MistralChatContentFormatter(),
#content_formatter=CustomOpenAIChatContentFormatter()
)
# prints UserWarning: `LlamaChatContentFormatter` will be deprecated in the future.
Please use `CustomOpenAIChatContentFormatter` instead.
response = chat.invoke(
[HumanMessage(content="Hallo, whats your name?")],max_tokens=3000
)
response
Here it already fails when invoking the LLM, which worked before with the CustomOpenAIChatFormatter:
ValueError:
api_typeAzureMLEndpointApiType.dedicated is not supported by this formatter
@jacoblee93 I might have found a solution to this. I added this code in class class MistralChatContentFormatter(LlamaChatContentFormatter) (from Line 187) azureml_endpoint.py:
elif api_type == AzureMLEndpointApiType.dedicated:
request_payload = json.dumps(
{
"input_data": {
"input_string": chat_messages,
"parameters": model_kwargs,
}
}
)
See here the full class:
class MistralChatContentFormatter(LlamaChatContentFormatter):
"""Content formatter for `Mistral`."""
def format_messages_request_payload(
self,
messages: List[BaseMessage],
model_kwargs: Dict,
api_type: AzureMLEndpointApiType,
) -> bytes:
"""Formats the request according to the chosen api"""
chat_messages = [self._convert_message_to_dict(message) for message in messages]
if chat_messages and chat_messages[0]["role"] == "system":
# Mistral OSS models do not explicitly support system prompts, so we have to
# stash in the first user prompt
chat_messages[1]["content"] = (
chat_messages[0]["content"] + "\n\n" + chat_messages[1]["content"]
)
del chat_messages[0]
if api_type == AzureMLEndpointApiType.realtime:
request_payload = json.dumps(
{
"input_data": {
"input_string": chat_messages,
"parameters": model_kwargs,
}
}
)
elif api_type == AzureMLEndpointApiType.serverless:
request_payload = json.dumps({"messages": chat_messages, **model_kwargs})
elif api_type == AzureMLEndpointApiType.dedicated:
request_payload = json.dumps(
{
"input_data": {
"input_string": chat_messages,
"parameters": model_kwargs,
}
}
)
else:
raise ValueError(
f"`api_type` {api_type} is not supported by this formatter"
)
return str.encode(request_payload)
With this, I can use the LLM in a chain and give the LLM a system prompt.
@jacoblee93 I might have found a solution to this. I added this code in class class MistralChatContentFormatter(LlamaChatContentFormatter) (from Line 187) azureml_endpoint.py:
elif api_type == AzureMLEndpointApiType.dedicated: request_payload = json.dumps( { "input_data": { "input_string": chat_messages, "parameters": model_kwargs, } } )
See here the full class:
class MistralChatContentFormatter(LlamaChatContentFormatter): """Content formatter for `Mistral`.""" def format_messages_request_payload( self, messages: List[BaseMessage], model_kwargs: Dict, api_type: AzureMLEndpointApiType, ) -> bytes: """Formats the request according to the chosen api""" chat_messages = [self._convert_message_to_dict(message) for message in messages] if chat_messages and chat_messages[0]["role"] == "system": # Mistral OSS models do not explicitly support system prompts, so we have to # stash in the first user prompt chat_messages[1]["content"] = ( chat_messages[0]["content"] + "\n\n" + chat_messages[1]["content"] ) del chat_messages[0] if api_type == AzureMLEndpointApiType.realtime: request_payload = json.dumps( { "input_data": { "input_string": chat_messages, "parameters": model_kwargs, } } ) elif api_type == AzureMLEndpointApiType.serverless: request_payload = json.dumps({"messages": chat_messages, **model_kwargs}) elif api_type == AzureMLEndpointApiType.dedicated: request_payload = json.dumps( { "input_data": { "input_string": chat_messages, "parameters": model_kwargs, } } ) else: raise ValueError( f"`api_type` {api_type} is not supported by this formatter" ) return str.encode(request_payload)
With this, I can use the LLM in a chain and give the LLM a system prompt.
Edit: but with this, streaming the LLM output in Langchain is not working:
chunks=[]
for chunk in llm.stream("hello. tell me something about yourself"):
chunks.append(chunk)
print(chunk.content, end="|", flush=True)
Results in:
APIStatusError Traceback (most recent call last)
[/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb) Zelle 16 line 2
[1](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=0) chunks=[]
----> [2](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=1) for chunk in llm.stream("hello. tell me something about yourself"):
[3](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=2) chunks.append(chunk)
[4](vscode-notebook-cell:/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/test.ipynb#Y203sZmlsZQ%3D%3D?line=3) print(chunk.content, end="|", flush=True)
File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:375](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:375), in BaseChatModel.stream(self, input, config, stop, **kwargs)
368 except BaseException as e:
369 run_manager.on_llm_error(
370 e,
371 response=LLMResult(
372 generations=[[generation]] if generation else []
373 ),
374 )
--> 375 raise e
376 else:
377 run_manager.on_llm_end(LLMResult(generations=[[generation]]))
File [~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:355](https://file+.vscode-resource.vscode-cdn.net/Users/mweissenba001/Documents/GitHub/fastapi_rag_demo/~/anaconda3/lib/python3.11/site-packages/langchain_core/language_models/chat_models.py:355), in BaseChatModel.stream(self, input, config, stop, **kwargs)
353 generation: Optional[ChatGenerationChunk] = None
354 try:
--> 355 for chunk in self._stream(messages, stop=stop, **kwargs):
356 if chunk.message.id is None:
...
(...)
1027 stream_cls=stream_cls,
1028 )
APIStatusError: Error code: 424 - {'detail': 'Not Found'}
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
I think I have set up the right deployment type. See here the full trace:
Description
Hi,
I set up Mixtral 8x22B on Azure AI/Machine Learning and now want to use it with Langchain. I have difficulties with the format I am getting, e.g. a ChatOpenAI response looks like this:
AIMessage(content='Hallo! Wie kann ich Ihnen helfen?', response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 8, 'total_tokens': 16}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='r')
This is how it looks when I am loading Mixtral 8x22B with AzureMLChatOnlineEndpoint:
BaseMessage(content='Hallo, ich bin ein deutscher Sprachassistent. Was kann ich für', type='assistant', id='run-23')
So with the Mixtral model the output a different format (BaseMessage vs. AIMessage). How can I change this to make it work just like an ChatOpenAI model?
I further explored if it works in a chain with a ChatPromptTemplate without success:
This results in
KeyError: 'output'
andValueError: Error while formatting response payload for chat model of type
AzureMLEndpointApiType.dedicated. Are you using the right formatter for the deployed model and endpoint type?
. See full trace above.In my application I want to easily switch between these two models.
Thanks in advance!
System Info
langchain 0.2.6 pypi_0 pypi langchain-chroma 0.1.0 pypi_0 pypi langchain-community 0.2.6 pypi_0 pypi langchain-core 0.2.10 pypi_0 pypi langchain-experimental 0.0.49 pypi_0 pypi langchain-groq 0.1.5 pypi_0 pypi langchain-openai 0.1.7 pypi_0 pypi langchain-postgres 0.0.3 pypi_0 pypi langchain-text-splitters 0.2.1