vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
https://vectara.com
Apache License 2.0
1.16k stars 45 forks source link

instance level metric outputs #6

Open cabreraalex opened 10 months ago

cabreraalex commented 10 months ago

This is fantastic work!

I was wondering if you all could release the instance-level outputs from the analysis. We'd love to visualize the results using Zeno

simonhughes22 commented 10 months ago

Can you expand on what you mean by this? You can interact with the model right now on Huggingface if that helps https://huggingface.co/vectara/hallucination_evaluation_model

cabreraalex commented 10 months ago

Ah sorry I meant the outputs of the hallucination evaluation model for each instance, e.g. a new column with the model's output in this file: https://github.com/vectara/hallucination-leaderboard/blob/main/leaderboard_summaries.csv

Would love to dive in and compare the summarization models more in depth, similar to reports we've published recently on the HF leaderboard and Whisper transcription models:

https://twitter.com/gneubig/status/1724872160144171104 https://twitter.com/a_a_cabrera/status/1722698009094529118

amin2718 commented 8 months ago

Hello, I'm sorry I didn't see this earlier. Tragically, Simon passed away over Thanksgiving, and other members of the team are picking this up. We'll try to get the new column added soon.

cabreraalex commented 8 months ago

Oh no, I'm so sorry! Best wishes to the family and team. Of course, no rush at all!