vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
https://vectara.com
Apache License 2.0
1.25k stars 50 forks source link

Claude 3.5 Sonnet New #77

Open sushantnair opened 1 month ago

sushantnair commented 1 month ago

Hello maintainers,

Can you please update the leaderboard by considering the New Claude 3.5 Sonnet released a few days ago?

Thanks.

ofermend commented 1 month ago

Hey @sushantnair - it's already updated. The results reflect the latest Claude 3.5 Sonnet

sushantnair commented 1 month ago

Hey @ofermend ,

Thanks a lot for the quick reply!

I'm honestly surprised that even the latest version of that model, named "Claude 3.5 Sonnet (New)", with that extra "New" in parenthesis, still has 4.5% hallucinations!

Yet in my experience, coding and debugging is outstanding. And the best perk of using Anthropic's models is the inherent safety that comes with using it.

Anyways, thanks again!