Benchmarking Snowflake arctic-instruct feedback function of groundedness

truera / trulens

Evaluation and Tracking for LLM Experiments

https://www.trulens.org/

MIT License

1.84k stars 163 forks source link

Benchmarking Snowflake arctic-instruct feedback function of groundedness #1185

Closed sfc-gh-dhuang closed 3 weeks ago

sfc-gh-dhuang commented 4 weeks ago

Items to add to release announcement:

Heading: Benchmarking Snowflake arctic-instruct feedback function of groundedness against 500 entries of SummEval test set

Other details that are good to know but need not be announced: highlighting the enterprise intelligence (efficiency + good alignment with human golden set) on groundedness eval

dataset ref: https://huggingface.co/datasets/mteb/summeval

review-notebook-app[bot] commented 4 weeks ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

sfc-gh-dhuang commented 3 weeks ago

Adding results for Mixtral-8x7b (mixture of expert) and Llama3-70B here as well