[BUG] Unable to get context_relevance_with_cot_reasons and groundedness_measure_with_cot_reasons when using Azure OpenAI

harsh-garg98 commented 1 week ago

Bug Description I am following the Azure OpenAI LangChain Quickstart google colab notebook (https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/expositional/models/azure_openai_langchain.ipynb) to set up trulens using Azure OpenAI. I am the 'Initialize Feedback Function(s)' stage and I am getting an error:

AttributeError: 'AzureOpenAI' object has no attribute 'context_relevance_with_cot_reasons'

According to the documentation the AzureOpenAI class derives from the OpenAI class which derives from the LLMProvider base class. The LLMProvider base class documentation page shows that it has the 'context_relevance_with_cot_reasons' and the 'groundedness_measure_with_cot_reasons' methods.

To Reproduce I am following the notebook and have not changed any bit of code.

Expected behavior The expected behaviour is to get the context relevance and groundedness scores.

Relevant Logs/Tracebacks Code:

from trulens_eval.feedback.provider import AzureOpenAI
import numpy as np
from trulens_eval.app import App

# Initialize AzureOpenAI-based feedback function collection class:
provider = AzureOpenAI(
    # Replace this with your azure deployment name
    deployment_name=os.getenv("DEPLOYMENT_NAME")
)

# select context to be used in feedback. the location of context is app specific.
context = App.select_context(rag_chain)

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(provider.relevance, name="Answer Relevance").on_input_output()

# Question/statement relevance between question and each context chunk.
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

# groundedness of output on the context
f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(context.collect())
    .on_output()
)

Error:

AttributeError                            Traceback (most recent call last)
Cell In[88], [line 20](vscode-notebook-cell:?execution_count=88&line=20)
     [16](vscode-notebook-cell:?execution_count=88&line=16) f_qa_relevance = Feedback(provider.relevance, name="Answer Relevance").on_input_output()
     [18](vscode-notebook-cell:?execution_count=88&line=18) # Question/statement relevance between question and each context chunk.
     [19](vscode-notebook-cell:?execution_count=88&line=19) f_context_relevance = (
---> [20](vscode-notebook-cell:?execution_count=88&line=20)     Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
     [21](vscode-notebook-cell:?execution_count=88&line=21)     .on_input()
     [22](vscode-notebook-cell:?execution_count=88&line=22)     .on(context)
     [23](vscode-notebook-cell:?execution_count=88&line=23)     .aggregate(np.mean)
     [24](vscode-notebook-cell:?execution_count=88&line=24) )
     [26](vscode-notebook-cell:?execution_count=88&line=26) # groundedness of output on the context
     [27](vscode-notebook-cell:?execution_count=88&line=27) f_groundedness = (
     [28](vscode-notebook-cell:?execution_count=88&line=28)     Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
     [29](vscode-notebook-cell:?execution_count=88&line=29)     .on(context.collect())
     [30](vscode-notebook-cell:?execution_count=88&line=30)     .on_output()
     [31](vscode-notebook-cell:?execution_count=88&line=31) )

File ~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/pydantic/main.py:811, in BaseModel.__getattr__(self, item)
    [808](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/pydantic/main.py:808)     return super().__getattribute__(item)  # Raises AttributeError if appropriate
    [809](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/pydantic/main.py:809) else:
    [810](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/pydantic/main.py:810)     # this is the current error
--> [811](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/pydantic/main.py:811)     raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')

AttributeError: 'AzureOpenAI' object has no attribute 'context_relevance_with_cot_reasons'

Environment:

OS: MacOS Sonoma Version 14.0
Python Version - 3.9.6
TruLens version - trulens_eval==0.25.1
Versions of other relevant installed libraries: llama-index==0.10.17 langchain==0.1.11 chromadb==0.4.24 langchainhub==0.1.20 bs4==0.0.2 langchain-openai==0.0.8 ipytree==0.2.2

dosubot[bot] commented 1 week ago

Hey, @harsh-garg98! 👋 I'm here to help you out with any bugs, questions, or contributions you have in mind. Let's squash those bugs together!

The AttributeError occurs because the methods context_relevance_with_cot_reasons and groundedness_measure_with_cot_reasons are not defined in the AzureOpenAI class. The notebook you are following defines a custom class Custom_AzureOpenAI that extends AzureOpenAI and includes a method context_relevance_with_cot_reasons_extreme, but it does not define the methods you are trying to use.

To resolve this issue, you need to either define the missing methods in the AzureOpenAI class or use the methods provided in the Custom_AzureOpenAI class. Here is an example of how you can define the missing methods in a custom class:

from trulens_eval.feedback.provider import AzureOpenAI
from typing import Tuple, Dict
from trulens_eval.feedback import prompts

class Custom_AzureOpenAI(AzureOpenAI):
    def context_relevance_with_cot_reasons(self, question: str, context: str) -> Tuple[float, Dict]:
        system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM
        user_prompt = str.format(prompts.CONTEXT_RELEVANCE_USER, question=question, context=context)
        user_prompt = user_prompt.replace("RELEVANCE:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

    def groundedness_measure_with_cot_reasons(self, context: str) -> Tuple[float, Dict]:
        system_prompt = prompts.GROUNDEDNESS_SYSTEM
        user_prompt = str.format(prompts.GROUNDEDNESS_USER, context=context)
        user_prompt = user_prompt.replace("GROUNDEDNESS:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

# Initialize the custom class
provider = Custom_AzureOpenAI(deployment_name=os.getenv("DEPLOYMENT_NAME"))

# Define feedback functions
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(context.collect())
    .on_output()
)

This should resolve the AttributeError and allow you to proceed with the notebook ^[1].

To continue talking to Dosu, mention @dosu.

harsh-garg98 commented 1 week ago

@dosu I have implemented the following:

from trulens_eval.feedback import prompts
from trulens_eval.app import App
from trulens_eval.feedback.provider import AzureOpenAI
import numpy as np

class Custom_AzureOpenAI(AzureOpenAI):
    def context_relevance_with_cot_reasons(self, question: str, context: str):
        system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM
        user_prompt = str.format(prompts.CONTEXT_RELEVANCE_USER, question=question, context=context)
        user_prompt = user_prompt.replace("RELEVANCE:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

    def groundedness_measure_with_cot_reasons(self, context: str):
        system_prompt = prompts.GROUNDEDNESS_SYSTEM
        user_prompt = str.format(prompts.GROUNDEDNESS_USER, context=context)
        user_prompt = user_prompt.replace("GROUNDEDNESS:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

# Initialize the custom class
provider = Custom_AzureOpenAI(deployment_name=os.getenv("DEPLOYMENT_NAME"))

context = App.select_context(rag_chain)

f_qa_relevance = Feedback(provider.relevance, name="Answer Relevance").on_input_output()

# Define feedback functions
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(context.collect())
    .on_output()
)

I am still getting an error on te f_groundedness function: ERROR:

✅ In Answer Relevance, input prompt will be set to __record__.main_input or `Select.RecordInput` .
✅ In Answer Relevance, input response will be set to __record__.main_output or `Select.RecordOutput` .
✅ In Context Relevance, input question will be set to __record__.main_input or `Select.RecordInput` .
✅ In Context Relevance, input context will be set to __record__.app.first.steps__.context.first.get_relevant_documents.rets .
✅ In Groundedness, input context will be set to __record__.main_output or `Select.RecordOutput` .
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[124], [line 36](vscode-notebook-cell:?execution_count=124&line=36)
     [27](vscode-notebook-cell:?execution_count=124&line=27) # Define feedback functions
     [28](vscode-notebook-cell:?execution_count=124&line=28) f_context_relevance = (
     [29](vscode-notebook-cell:?execution_count=124&line=29)     Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
     [30](vscode-notebook-cell:?execution_count=124&line=30)     .on_input()
     [31](vscode-notebook-cell:?execution_count=124&line=31)     .on(context)
     [32](vscode-notebook-cell:?execution_count=124&line=32)     .aggregate(np.mean)
     [33](vscode-notebook-cell:?execution_count=124&line=33) )
     [35](vscode-notebook-cell:?execution_count=124&line=35) f_groundedness = (
---> [36](vscode-notebook-cell:?execution_count=124&line=36)     Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
     [37](vscode-notebook-cell:?execution_count=124&line=37)     .on_output()
     [38](vscode-notebook-cell:?execution_count=124&line=38)     .on(context)
     [39](vscode-notebook-cell:?execution_count=124&line=39)     
     [40](vscode-notebook-cell:?execution_count=124&line=40) )

File ~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:528, in Feedback.on(self, *args, **kwargs)
    [525](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:525) new_selectors.update(kwargs)
    [527](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:527) for path in args:
--> [528](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:528)     argname = self._next_unselected_arg_name()
    [529](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:529)     new_selectors[argname] = path
    [530](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:530)     self._print_guessed_selector(argname, path)

File ~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:461, in Feedback._next_unselected_arg_name(self)
...
    [464](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:464)         )
    [466](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:466)     return par_names[0]
    [467](https://file+.vscode-resource.vscode-cdn.net/Users/harshgarg/Desktop/litellm/langfuse-self-host/~/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py:467) else:

TypeError: Feedback implementation <bound method Custom_AzureOpenAI.groundedness_measure_with_cot_reasons of Custom_AzureOpenAI(tru_class_info=__main__.Custom_AzureOpenAI, endpoint=Endpoint(openai), model_engine='gpt-4o-litellm', deployment_name='gpt-4o-litellm')> with signature (context: str) has no more inputs. Perhaps you meant to evalute it on App output only instead of app input and output?

dosubot[bot] commented 1 week ago

The TypeError you're encountering is due to the groundedness_measure_with_cot_reasons method expecting a single context argument, but the Feedback function is trying to pass multiple arguments to it. To resolve this, you should ensure that the Feedback function is correctly set up to pass only the required context argument.

Here's how you can adjust your code:

Ensure that the groundedness_measure_with_cot_reasons method in your Custom_AzureOpenAI class accepts both source and statement arguments, as per the expected method signature.
Update the Feedback function to correctly pass the context argument.

Here is the corrected implementation:

from trulens_eval.feedback import prompts
from trulens_eval.app import App
from trulens_eval.feedback.provider import AzureOpenAI
import numpy as np

class Custom_AzureOpenAI(AzureOpenAI):
    def context_relevance_with_cot_reasons(self, question: str, context: str):
        system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM
        user_prompt = str.format(prompts.CONTEXT_RELEVANCE_USER, question=question, context=context)
        user_prompt = user_prompt.replace("RELEVANCE:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

    def groundedness_measure_with_cot_reasons(self, source: str, statement: str):
        system_prompt = prompts.GROUNDEDNESS_SYSTEM
        user_prompt = str.format(prompts.GROUNDEDNESS_USER, source=source, statement=statement)
        user_prompt = user_prompt.replace("GROUNDEDNESS:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

# Initialize the custom class
provider = Custom_AzureOpenAI(deployment_name=os.getenv("DEPLOYMENT_NAME"))

context = App.select_context(rag_chain)

f_qa_relevance = Feedback(provider.relevance, name="Answer Relevance").on_input_output()

# Define feedback functions
f_context_relevance = (
    Feedback(provider.context_relevance_with_cot_reasons, name="Context Relevance")
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

f_groundedness = (
    Feedback(provider.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(context.collect())
    .on_output()
)

This should resolve the TypeError and allow you to proceed with the notebook ^[1]^[2].

To continue talking to Dosu, mention @dosu.

harsh-garg98 commented 1 week ago

@dosu The above fix worked for me but there's something still breaking in further steps. Here's what I've implemented:

The following works fine:

from trulens_eval.feedback import prompts
from trulens_eval.app import App
from trulens_eval.feedback.provider import AzureOpenAI
import numpy as np

class Custom_AzureOpenAI(AzureOpenAI):
    def context_relevance_with_cot_reasons(self, question: str, context: str):
        system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM
        user_prompt = str.format(
            prompts.CONTEXT_RELEVANCE_USER, question=question, context=context
        )
        user_prompt = user_prompt.replace("RELEVANCE:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

    def groundedness_measure_with_cot_reasons(self, source: str, statement: str):
        system_prompt = prompts.GROUNDEDNESS_SYSTEM
        user_prompt = str.format(
            prompts.GROUNDEDNESS_USER, source=source, statement=statement
        )
        user_prompt = user_prompt.replace("GROUNDEDNESS:", prompts.COT_REASONS_TEMPLATE)
        return self.generate_score_and_reasons(system_prompt, user_prompt)

    def style_check_professional(self, response: str):
        professional_prompt = str.format(
            "Please rate the professionalism of the following text on a scale from 0 to 10, where 0 is not at all professional and 10 is extremely professional: \n\n{}",
            response,
        )
        return self.generate_score(system_prompt=professional_prompt)

    def context_relevance_with_cot_reasons_extreme(
        self, question: str, context: str
    ):
        # remove scoring guidelines around middle scores
        system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM.replace(
            "- STATEMENT that is RELEVANT to most of the QUESTION should get a score of 5, 6, 7 or 8. Higher score indicates more RELEVANCE.\n\n",
            "",
        )
        user_prompt = str.format(
            prompts.CONTEXT_RELEVANCE_USER, question=question, context=context
        )
        user_prompt = user_prompt.replace("RELEVANCE:", prompts.COT_REASONS_TEMPLATE)

        return self.generate_score_and_reasons(system_prompt, user_prompt)

# Initialize the custom class
custom_azopenai = Custom_AzureOpenAI(deployment_name=os.getenv("DEPLOYMENT_NAME"))

context = App.select_context(rag_chain)

f_qa_relevance = Feedback(
    custom_azopenai.relevance, name="Answer Relevance"
).on_input_output()

# Define feedback functions
f_context_relevance = (
    Feedback(
        custom_azopenai.context_relevance_with_cot_reasons, name="Context Relevance"
    )
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

f_groundedness = (
    Feedback(custom_azopenai.groundedness_measure_with_cot_reasons, name="Groundedness")
    .on(context.collect())
    .on_output()
)

f_context_relevance_extreme = (
    Feedback(
        custom_azopenai.context_relevance_with_cot_reasons_extreme,
        name="Context Relevance Extreme",
    )
    .on_input()
    .on(context)
    .aggregate(np.mean)
)

f_professionalism = Feedback(
    custom_azopenai.style_check_professional, name="Professionalism"
).on_output()

tru_query_engine_recorder = TruChain(
    rag_chain,
    llm=custom_azopenai,
    app_id="LangChain_App1_AzureOpenAI",
    feedbacks=[
        f_qa_relevance,
        f_context_relevance,
        f_groundedness,
        f_context_relevance_extreme,
        f_professionalism,
    ],
)

The following throws an error (the following code immediately succeeds the preceding code:

query = "What is most interesting about this essay?"
with tru_query_engine_recorder as recording:
    answer = rag_chain.invoke(query)
    print("query was:", query)
    print("answer was:", answer)

The error:

INFO:httpx:HTTP Request: POST https://litellm-demo-oai.openai.azure.com/openai/deployments/ada-trulens/embeddings?api-version=2024-02-15-preview "HTTP/1.1 200 OK"
WARNING:trulens_eval.feedback.provider.endpoint.base:Callback class OpenAICallback is registered for handling create but there are no endpoints waiting to receive the result.
INFO:httpx:HTTP Request: POST https://litellm-demo-oai.openai.azure.com/openai/deployments/gpt-4o-litellm/chat/completions?api-version=2024-02-15-preview "HTTP/1.1 200 OK"
WARNING:trulens_eval.feedback.provider.endpoint.base:Callback class OpenAICallback is registered for handling create but there are no endpoints waiting to receive the result.
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
INFO:trulens_eval.database.sqlalchemy_db:✅ added record record_hash_6eb1662c07017a8563e2755ea8dec2e3
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
WARNING:trulens_eval.feedback.feedback:Feedback Function exception caught: Traceback (most recent call last):
  File "/Users/harshgarg/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/feedback.py", line 627, in run
    result_and_meta, part_cost = Endpoint.track_all_costs_tally(
  File "/Users/harshgarg/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 496, in track_all_costs_tally
    result, cbs = Endpoint.track_all_costs(
  File "/Users/harshgarg/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 477, in track_all_costs
    return Endpoint._track_costs(
  File "/Users/harshgarg/Desktop/litellm/langfuse-self-host/.venv/lib/python3.9/site-packages/trulens_eval/feedback/provider/endpoint/base.py", line 574, in _track_costs
    result: T = __func(*args, **kwargs)
  File "/var/folders/0p/5f_nkzcx0snc8nxcnmd20v840000gn/T/ipykernel_36085/1099205222.py", line 9, in context_relevance_with_cot_reasons
    system_prompt = prompts.CONTEXT_RELEVANCE_SYSTEM
AttributeError: module 'trulens_eval.feedback.prompts' has no attribute 'CONTEXT_RELEVANCE_SYSTEM'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
...
query was: What is most interesting about this essay?
answer was: The most interesting aspect of the essay is the author's consistent success in pursuing unprestigious work and turning it into something valuable. Despite initial perceptions of being marginal or lame, endeavors like Viaweb, Y Combinator, and writing online essays have proven effective. This approach encourages others to consider the potential in less conventional paths.
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?9afec781-84ef-475a-b8a8-b3ca3725c664) or open in a [text editor](command:workbench.action.openLargeOutput?9afec781-84ef-475a-b8a8-b3ca3725c664). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...
INFO:trulens_eval.database.sqlalchemy_db:🛑 feedback result Context Relevance Extreme FAILED feedback_result_hash_59a66e49b4d008f43f3bb2f282db2574
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
INFO:trulens_eval.database.sqlalchemy_db:✅ feedback result Answer Relevance DONE feedback_result_hash_bdfc627d665df6ad1e23a1dacfb7cddf
INFO:httpx:HTTP Request: POST https://litellm-demo-oai.openai.azure.com/openai/deployments/gpt-4o-litellm/chat/completions?api-version=2024-02-15-preview "HTTP/1.1 200 OK"
WARNING:trulens_eval.feedback.provider.endpoint.base:Callback class OpenAICallback is registered for handling create but there are no endpoints waiting to receive the result.
INFO:alembic.runtime.migration:Context impl SQLiteImpl.
INFO:alembic.runtime.migration:Will assume non-transactional DDL.
INFO:trulens_eval.database.sqlalchemy_db:✅ feedback result Professionalism DONE feedback_result_hash_b16e2c939b5bcda288580cdb0770f3af

There's 2 problems mentioned in the error messages posted above:

1.
INFO:httpx:HTTP Request: POST https://litellm-demo-oai.openai.azure.com/openai/deployments/ada-trulens/embeddings?api-version=2024-02-15-preview "HTTP/1.1 200 OK"
WARNING:trulens_eval.feedback.provider.endpoint.base:Callback class OpenAICallback is registered for handling create but there are no endpoints waiting to receive the result.
INFO:httpx:HTTP Request: POST https://litellm-demo-oai.openai.azure.com/openai/deployments/gpt-4o-litellm/chat/completions?api-version=2024-02-15-preview "HTTP/1.1 200 OK"
WARNING:trulens_eval.feedback.provider.endpoint.base:Callback class OpenAICallback is registered for handling create but there are no endpoints waiting to receive the result.

2.
AttributeError: module 'trulens_eval.feedback.prompts' has no attribute 'CONTEXT_RELEVANCE_SYSTEM'

Please help me resolve the above issues.

harsh-garg98 commented 1 week ago

Both the issues still persist even after making the changes mentioned above. It seems there's some discrepancies between the documentation and the actual implementation of the library specifically related to the 'LLMProvider' base class and the 'trulens_eval.feedback.prompts' module. There are a few methods mentioned in the docs that aren't actually present in the implementation (at least in the trulens_eval module version mentioned above). There's also the issue with OpenAICallback. I'm not sure why this error is persisting since I am not directly implementing the OpenAICallback base class. I've verified that the environment variables entered are correctly set and named.

Please help me resolve the issues mentioned above. @andrewisplinghoff @piotrm0 @yisding @stokedout

sfc-gh-jreini commented 6 days ago

@dosu is wrong here, your code should be fine (and I've tested on my end successfully).

Can you share more of your code? Could there be a namespace conflict with TruLens' AzureOpenAI?

harsh-garg98 commented 5 days ago

@dosu is wrong here, your code should be fine (and I've tested on my end successfully).

Can you share more of your code? Could there be a namespace conflict with TruLens' AzureOpenAI?

I have made a copy of the google colab notebook and am running it as a .ipynb file. Is there a way I can share the .ipynb file? Or is there a specific block of code you want to have a look at?

I checked for namespace conflict but there doesn't seem to be any.

from trulens_eval.feedback.provider import AzureOpenAI
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings

The above are the import that mention AzureOpenAI but all three are named differently so ideally there shouldn't be any namespace conflicts.

truera / trulens

[BUG] Unable to get context_relevance_with_cot_reasons and groundedness_measure_with_cot_reasons when using Azure OpenAI #1274