api_base parameter not properly propagated in client calls in ChatLiteLLM

DavidLMS commented 11 months ago

System Info

LangChain version: 0.0.346
Platform: Mac mini M1 16GB - macOS Sonoma 14.0
Python version: 3.11
LiteLLM version: 1.10.6

Who can help?

@hwchase17 @agola11

Information

[X] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[X] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

from langchain.chat_models import ChatLiteLLM

# Code for initializing the ChatLiteLLM instance
chat_model = ChatLiteLLM(api_base="https://custom.endpoints.huggingface.cloud", model="huggingface/Intel/neural-chat-7b-v3-1")

# Make a call to LiteLLM
text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]

print(chat_model(messages).content)

Error:

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/litellm/utils.py", line 4919, in handle_huggingface_chunk
    raise ValueError(chunk)
ValueError: {"error":"The model Intel/neural-chat-7b-v3-1 is too large to be loaded automatically (14GB > 10GB). Please use Spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints)."}

Is the same error if: chat_model = ChatLiteLLM(model="huggingface/Intel/neural-chat-7b-v3-1")

So api_base parameter not properly propagated in client calls in ChatLiteLLM.

Expected behavior

I would expect the ChatLiteLLM instance to correctly utilize the api_base parameter when making requests to the LiteLLM client. This should enable using models larger than the default size limit without encountering the error message about model size limits.

Notably, if I explicitly add the api_base argument in chat_models/litellm.py on line 239 (e.g., return self.client.completion(api_base=self.api_base, **kwargs)), the problem is resolved. This suggests that the api_base argument is not being correctly passed through **kwargs.

dosubot[bot] commented 11 months ago

🤖

It's great to see that you've identified the issue and even found a potential solution by explicitly adding the api_base argument in the chat_models/litellm.py file. Your contribution would be valuable to the LangChain community. If you're willing to create a pull request with your proposed fix, it would be a fantastic addition to the framework. Thank you for considering contributing to LangChain!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

DavidLMS commented 11 months ago

@dosu-bot I think that the solution I have proposed should not be the definitive one, it is a temporary fix because the api_base should be sent correctly through kwargs.

dosubot[bot] commented 11 months ago

🤖

Thank you for bringing this issue to our attention. It seems like the api_base parameter is not being correctly propagated in the ChatLiteLLM class. This is likely causing the issue you're experiencing with the model loading error.

The api_base parameter in the ChatLiteLLM class is used to set the base URL for the API that the client will interact with. It is handled in the _client_params property method of the ChatLiteLLM class. Here is the relevant code snippet:

@property
def _client_params(self) -> Dict[str, Any]:
    """Get the parameters used for the openai client."""
    set_model_value = self.model
    if self.model_name is not None:
        set_model_value = self.model_name
    self.client.api_base = self.api_base
    self.client.organization = self.organization
    creds: Dict[str, Any] = {
        "model": set_model_value,
        "force_timeout": self.request_timeout,
    }
    return {**self._default_params, **creds}

In this method, self.client.api_base is set to self.api_base. This means that the api_base parameter of the ChatLiteLLM instance is used to set the api_base attribute of the client. This attribute is typically used to specify the base URL for the API that the client will interact with.

However, it seems like the api_base parameter is not being passed to the qianfan.ChatCompletion client in the validate_environment method of the QianfanChatEndpoint class. This could be the root cause of the issue.

To resolve this, you can modify the validate_environment method to include the api_base parameter when initializing the qianfan.ChatCompletion client. Here's how you can do it:

@root_validator()
def validate_environment(cls, values: Dict) -> Dict:
    values["qianfan_ak"] = get_from_dict_or_env(
        values,
        "qianfan_ak",
        "QIANFAN_AK",
    )
    values["qianfan_sk"] = get_from_dict_or_env(
        values,
        "qianfan_sk",
        "QIANFAN_SK",
    )
    params = {
        "ak": values["qianfan_ak"],
        "sk": values["qianfan_sk"],
        "model": values["model"],
        "stream": values["streaming"],
        "api_base": values.get("api_base"),  # Add this line
    }
    if values["endpoint"] is not None and values["endpoint"] != "":
        params["endpoint"] = values["endpoint"]
    try:
        import qianfan

        values["client"] = qianfan.ChatCompletion(**params)
    except ImportError:
        raise ValueError(
            "qianfan package not found, please install it with "
            "`pip install qianfan`"
        )
    return values

This modification ensures that the api_base parameter is included when initializing the qianfan.ChatCompletion client, which should resolve the issue of it not being properly propagated in client calls.

Please try this solution and let us know if it resolves your issue. If you have any further questions or run into any other issues, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

DavidLMS commented 11 months ago

@dosu-bot You may be correct, but you have erred in providing a possible solution. You have done it about the file libs/langchain/langchain/chat_models/baidu_qianfan_endpoint.py, when it should have been about the file libs/langchain/langchain/chat_models/litellm.py. Specifically, the possible code snippet to modify according to your previous instructions would be:

@root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate api key, python package exists, temperature, top_p, and top_k."""
        try:
            import litellm
        except ImportError:
            raise ChatLiteLLMException(
                "Could not import google.generativeai python package. "
                "Please install it with `pip install google-generativeai`"
            )

        values["openai_api_key"] = get_from_dict_or_env(
            values, "openai_api_key", "OPENAI_API_KEY", default=""
        )
        values["azure_api_key"] = get_from_dict_or_env(
            values, "azure_api_key", "AZURE_API_KEY", default=""
        )
        values["anthropic_api_key"] = get_from_dict_or_env(
            values, "anthropic_api_key", "ANTHROPIC_API_KEY", default=""
        )
        values["replicate_api_key"] = get_from_dict_or_env(
            values, "replicate_api_key", "REPLICATE_API_KEY", default=""
        )
        values["openrouter_api_key"] = get_from_dict_or_env(
            values, "openrouter_api_key", "OPENROUTER_API_KEY", default=""
        )
        values["cohere_api_key"] = get_from_dict_or_env(
            values, "cohere_api_key", "COHERE_API_KEY", default=""
        )
        values["huggingface_api_key"] = get_from_dict_or_env(
            values, "huggingface_api_key", "HUGGINGFACE_API_KEY", default=""
        )
        values["together_ai_api_key"] = get_from_dict_or_env(
            values, "together_ai_api_key", "TOGETHERAI_API_KEY", default=""
        )
        values["client"] = litellm

        if values["temperature"] is not None and not 0 <= values["temperature"] <= 1:
            raise ValueError("temperature must be in the range [0.0, 1.0]")

        if values["top_p"] is not None and not 0 <= values["top_p"] <= 1:
            raise ValueError("top_p must be in the range [0.0, 1.0]")

        if values["top_k"] is not None and values["top_k"] <= 0:
            raise ValueError("top_k must be positive")

        return values

dosubot[bot] commented 11 months ago

🤖

Thank you for the detailed report, DavidLMS. You're correct in your analysis that the api_base parameter is not being propagated correctly in the ChatLiteLLM class. This is indeed due to the api_base parameter not being passed to the litellm client in the validate_environment method.

To resolve this issue, you can modify the validate_environment method to include the api_base parameter when initializing the litellm client. Here's how you can do it:

def validate_environment(self):
    if self.client is None:
        self.client = litellm.LiteLLM(api_base=self.api_base)

This should ensure that the api_base parameter is correctly utilized when making requests to the LiteLLM client.

However, it's also important to note that the api_base parameter is set in the _client_params property, but this property is not used in the acompletion_with_retry and completion_with_retry methods where the API calls are made.

To fix this, you can modify these methods to use the _client_params property instead of directly using the client object. Here's how you can do it:

def _completion_with_retry(**kwargs: Any) -> Any:
    return self.client.completion(**self._client_params, **kwargs)

return _completion_with_retry(**kwargs)

And for the async version:

async def _completion_with_retry(**kwargs: Any) -> Any:
    return await self.client.acreate(**self._client_params, **kwargs)

return await _completion_with_retry(**kwargs)

This way, the api_base and other client parameters will be correctly propagated in the API calls.

Please try these modifications and let me know if the issue persists. If you have any further questions or need additional assistance, feel free to ask.

Sources

libs/langchain/langchain/chat_models/litellm.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

DavidLMS commented 11 months ago

I will try to make a Pull Request that solves the problem.

DavidLMS commented 11 months ago

Solved in this PR #14393

langchain-ai / langchain