Open w8jie opened 1 month ago
@w8jie I'll take a look at this. Would you mind sharing your Guard setup with us? Specifically which validators you are using. A few of the validators use LLM reflection via LiteLLM and default to OpenAI's gpt-3.5-turbo unless otherwise specified. This is one scenario I could see the OpenAI call coming from. If it is something else we'll find it, but if you could share that information with us it would help. Thanks!
I'm unable to reproduce this issue with the example provided, though looking at the error it appears that you are using the langchain integration which the included example does not show. Along with the guard setup, could you please provide a code sample that demonstrates how you are using guardrails when you encounter this error?
@CalebCourier
Thanks for the explanation, I am currently using the ProfanityFree validator, would that fall under the LLM reflection example you were talking about? I have yet to try other validators.
The earlier code above is my attempt at running the Guard setup provided in the documentation. However, my goal is to use langchian LCEL integration (see ChatChain). This failed for me as well with the same errors.:
import os
from dotenv import load_dotenv
from prompt_templates import chat_template, qa_template
from langchain.schema.output_parser import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from llm_helper import llm
import requests
from guardrails.hub import ProfanityFree
from guardrails import Guard
# load env variables
load_dotenv()
guard = Guard().use(ProfanityFree, on_fail="exception")
def get_message_history(session_id: str) -> RedisChatMessageHistory:
'''
Get message history from Redis server
Args:
- session_id (str): input session_id key to retrieve correct chat history from database
Returns:
- RedisChatMessageHistory
'''
return RedisChatMessageHistory(
session_id,
url=f"redis://{os.getenv('REDIS_HOST')}:{os.getenv('REDIS_PORT')}/0"
)
class LLMChain:
def __init__(self, llm):
self.llm = llm
class ChatChain(LLMChain):
def __init__(self, llm, examples):
super().__init__(llm)
prompt = chat_template.format_chat_prompt(examples)
custom_chain = (
prompt
| llm
| StrOutputParser()
| guard.to_runnable()
)
self.mem_chain = RunnableWithMessageHistory(
custom_chain,
get_message_history,
input_messages_key="input",
history_messages_key="history"
)
def get_chain(self):
return self.mem_chain
& this is how my "local" TensorRT LLM is called:
class LLMClient():
'''
Instantiates LLM Service
'''
def __init__(self, type):
if type == "openai":
self.llm = ChatOpenAI(openai_api_key=os.getenv("OPENAI_API_KEY"),
model=os.getenv("OPENAI_MODEL_NAME"))
elif type == "azure-openai":
self.llm = AzureChatOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
openai_api_type=os.getenv("AZURE_OPENAI_API_TYPE"),
openai_api_key=os.getenv("AZURE_OPENAI_API_KEY"),
azure_deployment=os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"),
openai_api_version=os.getenv("AZURE_OPENAI_API_VER")
)
elif type == "local":
self.llm = LocalLLM(
server_url=f"http://triton-models:{os.getenv('TRITON_API_PORT')}/v2/models/{os.getenv('TRITON_CHAT_MODEL_NAME')}/generate"
)
def get_llm(self):
return self.llm
llm = LLMClient(os.getenv("LLM_TYPE")).get_llm()
After which, I am simply invoking the chain.
@w8jie Thanks for the extra information! I can confirm that the ProfanityFree validator does not use LLM reflection. I'll try to repro this issue with the example you provided and report back when I've found something.
@w8jie I believe I found the cause of the error you are seeing and I've opened a Pull Request with a fix here: https://github.com/guardrails-ai/guardrails/pull/983
Thanks again for the extra information and for raising this issue!
Hey @CalebCourier,
I waited some time for the PR to be merged to test out the bug fix, however I am still facing the same errors.
Error message:
generate-1 | TRT_LLM URL: http://triton-models:8000/v2/models/ensemble/generate
generate-1 | Traceback (most recent call last):
generate-1 | File "/app/guardrails_helper.py", line 71, in <module>
generate-1 | validated_response = guard(
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 959, in __call__
generate-1 | return trace_guard_execution(
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/guard_tracing.py", line 181, in trace_guard_execution
generate-1 | raise e
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/guard_tracing.py", line 172, in trace_guard_execution
generate-1 | result = _execute_fn(*args, **kwargs)
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 837, in _execute
generate-1 | return guard_context.run(
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/telemetry/common.py", line 99, in wrapped_func
generate-1 | return func(*args, **kwargs)
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 814, in __exec
generate-1 | return self._exec(
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/guard.py", line 871, in _exec
generate-1 | api = get_llm_ask(llm_api, *args, **kwargs)
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/llm_providers.py", line 787, in get_llm_ask
generate-1 | if llm_api == get_static_openai_create_func():
generate-1 | File "/usr/local/lib/python3.10/site-packages/guardrails/utils/openai_utils/v1.py", line 16, in get_static_openai_create_func
generate-1 | return openai.completions.create
generate-1 | File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 20, in __getattr__
generate-1 | proxied = self.__get_proxied__()
generate-1 | File "/usr/local/lib/python3.10/site-packages/openai/_utils/_proxy.py", line 55, in __get_proxied__
generate-1 | return self.__load__()
generate-1 | File "/usr/local/lib/python3.10/site-packages/openai/_module_client.py", line 60, in __load__
generate-1 | return _load_client().completions
generate-1 | File "/usr/local/lib/python3.10/site-packages/openai/__init__.py", line 298, in _load_client
generate-1 | raise _AmbiguousModuleClientUsageError()
generate-1 | openai._AmbiguousModuleClientUsageError: Ambiguous use of module client; please set `openai.api_type` or the `OPENAI_API_TYPE` environment variable to `openai` or `azure`
Code to reproduce:
from guardrails import Guard
from guardrails.hub import ProfanityFree
import os
import requests
import json
import litellm
from openai import AzureOpenAI
from dotenv import load_dotenv
# Use the Guard with the validator
guard = Guard().use(ProfanityFree, on_fail="exception")
TRITON_SERVER_URL = f"http://triton-models:{os.getenv('TRITON_API_PORT')}/v2/models/{os.getenv('TRITON_CHAT_MODEL_NAME')}/generate"
print("TRT_LLM URL:", TRITON_SERVER_URL)
# Function that takes the prompt as a string and returns the LLM output as string
def my_llm_api(prompt: str) -> str:
"""Custom LLM API wrapper.
At least one of prompt, instruction or msg_history should be provided.
Args:
prompt (str): The prompt to be passed to the LLM API
instruction (str): The instruction to be passed to the LLM API
msg_history (list[dict]): The message history to be passed to the LLM API
**kwargs: Any additional arguments to be passed to the LLM API
Returns:
str: The output of the LLM API
"""
# Call your LLM API here
# Prepare the input for the LLM API
inputs = {
"prompt": prompt
}
# Make a request to the Triton server
response = requests.post(TRITON_SERVER_URL, json=inputs)
response_data = response.json()
print("RESPONSE FROM TRT LLM")
print(response_data)
# Extract the output from the response
llm_output = response_data.get("outputs", {}).get("text", "")
return llm_output
validated_response = guard(
my_llm_api,
prompt="Can you generate a list of 10 things that are not food?",
)
print("VALIDATED RESPONSE FROM TRT LLM")
print(validated_response)
GR Versions:
I am now wondering if trition-llm works with guradrails-ai's llm wrapper, or are there specific versions of trt-llm builds needed (im using llama3-8b-instruct).
Appreciate if you can take a look at this further!
it looks like we are still falling back to the openai client. could you include the output of your pip show guardrails-ai so we can confirm you're on the latest version?
Describe the bug I am trying to use the custom LLM wrapper so that I can add guardrails using a NVIDIA TensorRT LLM (TRT-LLM). I do not wish to use openai/azure openai for the guardrails call.
However I'm met with the following error:
openai.OpenAIError: Ambiguous use of module client; please set 'openai.api_type' or the 'OPENAI_API_TYPE' environment variable to 'openai' or 'azure'
To Reproduce Here's my code:
Expected behavior Full error:
Library version: guardrails-ai==0.5.2
The documentation on using a custom LLM wrapper was a little insufficient and I don't understand why openai is called even when I did not state to use those APIs. Appreciate if anyone can share if they found a workaround.