guardrails-ai / guardrails

Adding guardrails to large language models.
https://www.guardrailsai.com/docs
Apache License 2.0
3.87k stars 289 forks source link

[bug] Trouble calling guard without using openai or azure openai APIs #979

Open w8jie opened 1 month ago

w8jie commented 1 month ago

Describe the bug I am trying to use the custom LLM wrapper so that I can add guardrails using a NVIDIA TensorRT LLM (TRT-LLM). I do not wish to use openai/azure openai for the guardrails call.

However I'm met with the following error: openai.OpenAIError: Ambiguous use of module client; please set 'openai.api_type' or the 'OPENAI_API_TYPE' environment variable to 'openai' or 'azure'

To Reproduce Here's my code:

def my_llm_api(
    prompt: str) -> str:
    """Custom LLM API wrapper.

    At least one of prompt, instruction or msg_history should be provided.

    Args:
        prompt (str): The prompt to be passed to the LLM API
        instruction (str): The instruction to be passed to the LLM API
        msg_history (list[dict]): The message history to be passed to the LLM API
        **kwargs: Any additional arguments to be passed to the LLM API

    Returns:
        str: The output of the LLM API
    """

    # Call your LLM API here

    # Prepare the input for the LLM API
    inputs = {
        "prompt": prompt
    }

    # Make a request to the Triton server
    response = requests.post(TRITON_SERVER_URL, json=inputs)
    response_data = response.json()

    # Extract the output from the response
    llm_output = response_data.get("outputs", {}).get("text", "")

    return llm_output

validated_response = guard(
    my_llm_api,
    prompt="Can you generate a list of 10 things that are not food?",
)
print(validated_response)

Expected behavior Full error:

File "/usr/local/lib/python3.10/site-packages/guardrails/integrations/langchain/guard_runnable.py", line 15, in _validate
generate-1       |     response = self.guard.validate(input)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 1080, in validate
generate-1       |     return self.parse(llm_output=llm_output, *args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 951, in parse
generate-1       |     return self._execute(  # type: ignore # streams are supported for parse
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 774, in _execute
generate-1       |     return guard_context.run(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/utils/telemetry_utils.py", line 347, in wrapped_func
generate-1       |     return func(*args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 751, in __exec
generate-1       |     return self._exec(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 805, in _exec
generate-1       |     api = get_llm_ask(llm_api, *args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/llm_providers.py", line 801, in get_llm_ask
generate-1       |     if llm_api == get_static_openai_create_func():
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/utils/openai_utils/v1.py", line 19, in get_static_openai_create_func
generate-1       |     return openai.completions.create
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 20, in __getattr__
generate-1       |     proxied = self.__get_proxied__()
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 55, in __get_proxied__
generate-1       |     return self.__load__()
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_module_client.py", line 60, in __load__
generate-1       |     return _load_client().completions
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/__init__.py", line 294, in _load_client
generate-1       |     raise _AmbiguousModuleClientUsageError()
generate-1       | openai.OpenAIError: Ambiguous use of module client; please set `openai.api_type` or the `OPENAI_API_TYPE` environment variable to `openai` or `azure`

Library version: guardrails-ai==0.5.2

The documentation on using a custom LLM wrapper was a little insufficient and I don't understand why openai is called even when I did not state to use those APIs. Appreciate if anyone can share if they found a workaround.

CalebCourier commented 1 month ago

@w8jie I'll take a look at this. Would you mind sharing your Guard setup with us? Specifically which validators you are using. A few of the validators use LLM reflection via LiteLLM and default to OpenAI's gpt-3.5-turbo unless otherwise specified. This is one scenario I could see the OpenAI call coming from. If it is something else we'll find it, but if you could share that information with us it would help. Thanks!

CalebCourier commented 1 month ago

I'm unable to reproduce this issue with the example provided, though looking at the error it appears that you are using the langchain integration which the included example does not show. Along with the guard setup, could you please provide a code sample that demonstrates how you are using guardrails when you encounter this error?

w8jie commented 1 month ago

@CalebCourier

Thanks for the explanation, I am currently using the ProfanityFree validator, would that fall under the LLM reflection example you were talking about? I have yet to try other validators.

The earlier code above is my attempt at running the Guard setup provided in the documentation. However, my goal is to use langchian LCEL integration (see ChatChain). This failed for me as well with the same errors.:

import os
from dotenv import load_dotenv
from prompt_templates import chat_template, qa_template
from langchain.schema.output_parser import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from llm_helper import llm
import requests
from guardrails.hub import ProfanityFree
from guardrails import Guard

# load env variables
load_dotenv()

guard = Guard().use(ProfanityFree, on_fail="exception")

def get_message_history(session_id: str) -> RedisChatMessageHistory:
    '''
    Get message history from Redis server

    Args:
    - session_id (str): input session_id key to retrieve correct chat history from database

    Returns:
    - RedisChatMessageHistory
    '''
    return RedisChatMessageHistory(
        session_id,
        url=f"redis://{os.getenv('REDIS_HOST')}:{os.getenv('REDIS_PORT')}/0"
    )

class LLMChain:
    def __init__(self, llm):
        self.llm = llm

class ChatChain(LLMChain):
    def __init__(self, llm, examples):
        super().__init__(llm)
        prompt = chat_template.format_chat_prompt(examples)
        custom_chain = (
            prompt
            | llm
            | StrOutputParser()
            | guard.to_runnable()
        )
        self.mem_chain = RunnableWithMessageHistory(
            custom_chain,
            get_message_history,
            input_messages_key="input",
            history_messages_key="history"
        )

    def get_chain(self):
        return self.mem_chain

& this is how my "local" TensorRT LLM is called:

class LLMClient():
    '''
    Instantiates LLM Service
    '''
    def __init__(self, type):
        if type == "openai":
            self.llm = ChatOpenAI(openai_api_key=os.getenv("OPENAI_API_KEY"),
                                model=os.getenv("OPENAI_MODEL_NAME"))
        elif type == "azure-openai":
            self.llm = AzureChatOpenAI(
                azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
                openai_api_type=os.getenv("AZURE_OPENAI_API_TYPE"),
                openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
                azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"),
                openai_api_version=os.getenv("AZURE_OPENAI_API_VER")
            )
        elif type == "local":
            self.llm = LocalLLM(
                server_url=f"http://triton-models:{os.getenv('TRITON_API_PORT')}/v2/models/{os.getenv('TRITON_CHAT_MODEL_NAME')}/generate"
            )

    def get_llm(self):
        return self.llm

llm = LLMClient(os.getenv("LLM_TYPE")).get_llm()

After which, I am simply invoking the chain.

CalebCourier commented 1 month ago

@w8jie Thanks for the extra information! I can confirm that the ProfanityFree validator does not use LLM reflection. I'll try to repro this issue with the example you provided and report back when I've found something.

CalebCourier commented 1 month ago

@w8jie I believe I found the cause of the error you are seeing and I've opened a Pull Request with a fix here: https://github.com/guardrails-ai/guardrails/pull/983

Thanks again for the extra information and for raising this issue!

w8jie commented 1 month ago

Hey @CalebCourier,

I waited some time for the PR to be merged to test out the bug fix, however I am still facing the same errors.

Error message:

generate-1       | TRT_LLM URL: http://triton-models:8000/v2/models/ensemble/generate
generate-1       | Traceback (most recent call last):
generate-1       |   File "/app/guardrails_helper.py", line 71, in <module>
generate-1       |     validated_response = guard(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 959, in __call__
generate-1       |     return trace_guard_execution(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/guard_tracing.py", line 181, in trace_guard_execution
generate-1       |     raise e
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/guard_tracing.py", line 172, in trace_guard_execution
generate-1       |     result = _execute_fn(*args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 837, in _execute
generate-1       |     return guard_context.run(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/common.py", line 99, in wrapped_func
generate-1       |     return func(*args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 814, in __exec
generate-1       |     return self._exec(
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 871, in _exec
generate-1       |     api = get_llm_ask(llm_api, *args, **kwargs)
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/llm_providers.py", line 787, in get_llm_ask
generate-1       |     if llm_api == get_static_openai_create_func():
generate-1       |   File "/usr/local/lib/python3.10/site-packages/guardrails/utils/openai_utils/v1.py", line 16, in get_static_openai_create_func
generate-1       |     return openai.completions.create
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 20, in __getattr__
generate-1       |     proxied = self.__get_proxied__()
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 55, in __get_proxied__
generate-1       |     return self.__load__()
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/_module_client.py", line 60, in __load__
generate-1       |     return _load_client().completions
generate-1       |   File "/usr/local/lib/python3.10/site-packages/openai/__init__.py", line 298, in _load_client
generate-1       |     raise _AmbiguousModuleClientUsageError()
generate-1       | openai._AmbiguousModuleClientUsageError: Ambiguous use of module client; please set `openai.api_type` or the `OPENAI_API_TYPE` environment variable to `openai` or `azure`

Code to reproduce:

from guardrails import Guard
from guardrails.hub import ProfanityFree

import os
import requests
import json
import litellm
from openai import AzureOpenAI
from dotenv import load_dotenv

# Use the Guard with the validator
guard = Guard().use(ProfanityFree, on_fail="exception")

TRITON_SERVER_URL = f"http://triton-models:{os.getenv('TRITON_API_PORT')}/v2/models/{os.getenv('TRITON_CHAT_MODEL_NAME')}/generate"
print("TRT_LLM URL:", TRITON_SERVER_URL)

# Function that takes the prompt as a string and returns the LLM output as string
def my_llm_api(prompt: str) -> str:
    """Custom LLM API wrapper.

    At least one of prompt, instruction or msg_history should be provided.

    Args:
        prompt (str): The prompt to be passed to the LLM API
        instruction (str): The instruction to be passed to the LLM API
        msg_history (list[dict]): The message history to be passed to the LLM API
        **kwargs: Any additional arguments to be passed to the LLM API

    Returns:
        str: The output of the LLM API
    """

    # Call your LLM API here

    # Prepare the input for the LLM API
    inputs = {
        "prompt": prompt
    }

    # Make a request to the Triton server
    response = requests.post(TRITON_SERVER_URL, json=inputs)
    response_data = response.json()
    print("RESPONSE FROM TRT LLM")
    print(response_data)

    # Extract the output from the response
    llm_output = response_data.get("outputs", {}).get("text", "")

    return llm_output

validated_response = guard(
    my_llm_api,
    prompt="Can you generate a list of 10 things that are not food?",
)
print("VALIDATED RESPONSE FROM TRT LLM")
print(validated_response)

GR Versions:

I am now wondering if trition-llm works with guradrails-ai's llm wrapper, or are there specific versions of trt-llm builds needed (im using llama3-8b-instruct).

Appreciate if you can take a look at this further!

dtam commented 2 weeks ago

it looks like we are still falling back to the openai client. could you include the output of your pip show guardrails-ai so we can confirm you're on the latest version?