truera / trulens

Evaluation and Tracking for LLM Experiments
https://www.trulens.org/
MIT License
1.84k stars 163 forks source link

Benchmarking Snowflake arctic-instruct feedback function of groundedness #1185

Closed sfc-gh-dhuang closed 3 weeks ago

sfc-gh-dhuang commented 4 weeks ago

Items to add to release announcement:

Other details that are good to know but need not be announced: highlighting the enterprise intelligence (efficiency + good alignment with human golden set) on groundedness eval image

dataset ref: https://huggingface.co/datasets/mteb/summeval

review-notebook-app[bot] commented 4 weeks ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

sfc-gh-dhuang commented 3 weeks ago

Adding results for Mixtral-8x7b (mixture of expert) and Llama3-70B here as well image