Open arnocandel opened 1 year ago
related to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard but more meaningful scores
https://github.com/h2oai/h2ogpt/blob/ba6cad3207f8319b5c5f4b1e9099d7b909fdb661/generate.py#L1328-L1347
In order, from best to worst, using 500 evals using above test, only choosing correct prompt type for each model, everything else kept the same.
FIXME - was much better with --num_beams=1
Lesson: WizardLM is great https://github.com/h2oai/h2ogpt/issues/96
h2ogpt-oasst1-falcon-40b.sharegpt.log
h2ogpt-oig-oasst1-falcon-40b.sharegpt.log
related to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard but more meaningful scores
https://github.com/h2oai/h2ogpt/blob/ba6cad3207f8319b5c5f4b1e9099d7b909fdb661/generate.py#L1328-L1347
In order, from best to worst, using 500 evals using above test, only choosing correct prompt type for each model, everything else kept the same.
gpt3.5
junelee/wizard-vicuna-13b
openaccess-ai-collective/wizard-mega-13b
ehartford/WizardLM-13B-Uncensored
AlekseyKorshuk/vicuna-7b
TheBloke/stable-vicuna-13B-HF
ehartford/WizardLM-7B-Uncensored
h2ogpt-oasst1-512-20b
h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2
h2ogpt-oasst1-512-12b
FIXME - was much better with --num_beams=1![df_scores_500_500_1234_False_h2ogpt-oasst1-512-12b_](https://github.com/h2oai/h2ogpt/assets/6147661/a26de787-8078-4fbd-bbd9-e8993cbcaeeb)
h2ogpt-oig-oasst1-512-12b
dolly-v2-12b
h2ogpt-oig-oasst1-512-6.9b
Lesson: WizardLM is great https://github.com/h2oai/h2ogpt/issues/96