truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
2.1k stars 181 forks source link

error "name 'Bedrock' is not defined" when groundedness is calculated #804

Closed qiongw closed 7 months ago

qiongw commented 8 months ago

Hi,

I am using https://www.trulens.org/trulens_eval/langchain_quickstart/ as the reference to calculate answer relevance, context relevance and groudedness within Databricks. I am able to get the answer relevance and context relevance scores but not the groudedness. I got the error "name 'Bedrock' is not defined".

Does anyone have a clue how to fix it? Thank you in advance.

Kind regards, Qiong

joshreini1 commented 8 months ago

@qiongw - please share your code including how you are setting up your feedback providers.

qiongw commented 8 months ago

@joshreini1
I first use the same code in https://www.trulens.org/trulens_eval/langchain_quickstart/ , I got 'relevance_with_cot_reasons' 1.0 'qs_relevance_with_cot_reasons' 0.4 'groundedness_measure' None

Then I used the following code to check why 'groundedness_measure' is None: from trulens_eval.feedback.provider.openai import AzureOpenAI as fAzureOpenAI

os.environ["OPENAI_API_KEY"] = *

llm_args = { "deployment_name": , "azure_endpoint": , "api_key": , "api_version": , }

llm = fAzureOpenAI(**llm_args)

provider = llm

groundedness_golden_set = [{'query': '(CNN)Donald Sterling\'s racist remarks cost him an NBA team last year. But now it\'s his former female companion who has lost big. A Los Angeles judge has ordered V. Stiviano to pay back more than $2.6 million in gifts after Sterling\'s wife sued her. In the lawsuit, Rochelle "Shelly" Sterling accused Stiviano of targeting extremely wealthy older men. She claimed Donald Sterling used the couple\'s money to buy Stiviano a Ferrari, two Bentleys and a Range Rover, and that he helped her get a $1.8 million duplex. Who is V. Stiviano? Stiviano countered that there was nothing wrong with Donald Sterling giving her gifts and that she never took advantage of the former Los Angeles Clippers owner, who made much of his fortune in real estate. Shelly Sterling was thrilled with the court decision Tuesday, her lawyer told CNN affiliate KABC. "This is a victory for the Sterling family in recovering the $2,630,000 that Donald lavished on a conniving mistress," attorney Pierce O\'Donnell said in a statement. "It also sets a precedent that the injured spouse can recover damages from the recipient of these ill-begotten gifts." Stiviano\'s gifts from Donald Sterling didn\'t just include uber-expensive items like luxury cars. According to the Los Angeles Times, the list also includes a $391 Easter bunny costume, a $299 two-speed blender and a $12 lace thong. Donald Sterling\'s downfall came after an audio recording surfaced of the octogenarian arguing with Stiviano. In the tape, Sterling chastises Stiviano for posting pictures on social media of her posing with African-Americans, including basketball legend Magic Johnson. "In your lousy fing Instagrams, you don\'t have to have yourself with -- walking with black people," Sterling said in the audio first posted by TMZ. He also tells Stiviano not to bring Johnson to Clippers games and not to post photos with the Hall of Famer so Sterling\'s friends can see. "Admire him, bring him here, feed him, fk him, but don\'t put (Magic) on an Instagram for the world to have to see so they have to call me," Sterling said. NBA Commissioner Adam Silver banned Sterling from the league, fined him $2.5 million and pushed through a charge to terminate all of his ownership rights in the franchise. Fact check: Donald Sterling\'s claims vs. reality CNN\'s Dottie Evans contributed to this report.', 'response': "donald sterling , nba team last year . sterling 's wife sued for $ 2.6 million in gifts . sterling says he is the former female companion who has lost the . sterling has ordered v. stiviano to pay back $ 2.6 m in gifts after his wife sued . sterling also includes a $ 391 easter bunny costume , $ 299 and a $ 299 .", 'expected_score': 0.2}]

groundedness_openai = Groundedness(groundedness_provider=provider)
f_groundedness_openai = Feedback(groundedness_openai.groundedness_measure_with_cot_reasons, name = "Groundedness OpenAI GPT-3.5").on_input().on_output().aggregate(groundedness_openai.grounded_statements_aggregator) def wrapped_groundedness_openai(input, output): return f_groundedness_openai(input, output)[0]['full_doc_score']

ground_truth = GroundTruthAgreement(groundedness_golden_set)

f_mae = Feedback(ground_truth.mae, name = "Mean Absolute Error").on(Select.Record.calls[0].args.args[0]).on(Select.Record.calls[0].args.args[1]).on_output()

tru_wrapped_groundedness_openai = TruBasicApp(wrapped_groundedness_openai, app_id = "Groundedness OpenAI GPT-3.5", feedbacks=[f_mae])

for i in range(len(groundedness_golden_set)): source = groundedness_golden_set[i]["query"] response = groundedness_golden_set[i]["response"] with tru_wrapped_groundedness_openai as recording: tru_wrapped_groundedness_openai.app(source, response)

Then I got the error: NameError: name 'Bedrock' is not defined

joshreini1 commented 8 months ago

Hey @qiongw - can you upgrade to the latest trulens-eval (0.22.2)? This may be resolved

qiongw commented 8 months ago

@joshreini1 I upgrade to the latest trulens-eval (0.22.2) and I still have the same error message. See below:

NameError: name 'Bedrock' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-39a4c46b-19af-4f99-b2bf-cc6dbe4a066c/lib/python3.10/site-packages/trulens_eval/feedback/feedback.py", line 506, in run raise RuntimeError( RuntimeError: Evaluation of groundedness_measure_with_cot_reasons failed on inputs: {'source': '[](https://s.turbifycdn.com/aah/paulgraham/essays-6.gif) \n' ' \n' '| ![What name 'Bedrock' is not defined.

joshreini1 commented 8 months ago

Can you run the following smaller test to try and isolate your feedback function?

groundedness_openai.groundedness_measure_with_cot_reasons(“the capitol of Zimbabwe is Harare”,”Harare sits as the captitol of Zimbabwe”)

qiongw commented 8 months ago

@joshreini1 I checked, if I only install %pip install trulens_eval==0.22.2 %pip install openai==1.3.7 then I could run the code without error message.

But if I install all packages I needed %pip install trulens_eval==0.22.2 %pip install llama_index==0.9.34 %pip install html2text>=2020.1.16 %pip install torch==2.1.2 %pip install sentence-transformers==2.2.2 %pip install trulens_eval==0.20.3 %pip install openai==1.3.7 %pip install langchain==0.1.4 %pip install chromadb==0.4.22 %pip install langchainhub==0.1.14 %pip install bs4==0.0.2 %pip install tiktoken==0.5.2 %pip install ragas==0.0.22 then I rerun the same code I will get error message "name 'Bedrock' is not defined".

piotrm0 commented 7 months ago

Hi; if you put all of those requirements on the same pip install line, does pip give you an error or warning about possible version incompatibility?

qiongw commented 7 months ago

@piotrm0 Indeed, I have an error databricks-sdk 0.1.6 requires requests<2.29.0,>=2.28.1, but you have requests 2.31.0 which is incompatible. botocore 1.27.28 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.2.1 which is incompatible. I will check, thanks.