vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents
https://vectara.com
Apache License 2.0
1.11k stars 42 forks source link

Google PaLM reference #8

Closed suddhasatwabhaumik closed 6 months ago

suddhasatwabhaumik commented 7 months ago

Hello.

Request the following w.r.t Google PaLM models mentioned in this report.

  1. There are multiple versions of Google PaLM models in form of text-bison models. The one used in this test is an older version while we already have much better versions of the same model available on Google cloud platform for example for customer's use. Please specifically mention which model was used to perform the comparisons, similar to how we have mentioned about GPT-3 or GPT-4.

  2. Using the same model which is used for evaluation, we have run tests on the latest text-bison model and find it giving answers up to 95% accuracy. Please rerun the requisite tests with latest PaLM models to have this report updated.

Thank you!!

mbae26 commented 6 months ago

Thanks for your request.

We recently updated our leaderboard with newer versions of PaLM models. We called each model using model identifiers text-bison and chat-bison via Vertex AI API. We referred to Model Versions and Lifecycles.

Please let us know if you have any additional request or feedback.

Thank you!

amin2718 commented 6 months ago

@suddhasatwabhaumik as @mbae26 stated, we just added "Google Gemini Pro" and got 95.2% accuracy. I'm going to mark this issue as completed, but please let me know if you have further concerns.

Thanks for your comments.