ShareGPT evals for various models - Githubissues

h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/

http://h2o.ai

Apache License 2.0

10.96k stars 1.2k forks source link

ShareGPT evals for various models #127

Open arnocandel opened 1 year ago

arnocandel commented 1 year ago

related to https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard but more meaningful scores

https://github.com/h2oai/h2ogpt/blob/ba6cad3207f8319b5c5f4b1e9099d7b909fdb661/generate.py#L1328-L1347

In order, from best to worst, using 500 evals using above test, only choosing correct prompt type for each model, everything else kept the same.

gpt3.5

df_scores_500_500_1234_True_gpt35_

junelee/wizard-vicuna-13b

df_scores_500_500_1234_False_wizard-vicuna-13b_

openaccess-ai-collective/wizard-mega-13b

df_scores_500_500_1234_False_wizard-mega-13b_

ehartford/WizardLM-13B-Uncensored

df_scores_500_500_1234_False_WizardLM-13B-Uncensored_

AlekseyKorshuk/vicuna-7b

df_scores_500_500_1234_False_vicuna-7b_

TheBloke/stable-vicuna-13B-HF

df_scores_500_500_1234_False_stable-vicuna-13B-HF_

ehartford/WizardLM-7B-Uncensored

df_scores_500_500_1234_False_WizardLM-7B-Uncensored_

h2ogpt-oasst1-512-20b

df_scores_500_500_1234_False_h2ogpt-oasst1-512-20b_

h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2

df_scores_500_500_1234_false_h2ogpt-gm-oasst1-en-2048-open-llama-7b-preview-300bt-v2__720

h2ogpt-oasst1-512-12b

FIXME - was much better with --num_beams=1 df_scores_500_500_1234_False_h2ogpt-oasst1-512-12b_

h2ogpt-oig-oasst1-512-12b

df_scores_500_500_1234_False_h2ogpt-oig-oasst1-512-12b_

dolly-v2-12b

df_scores_500_500_1234_False_dolly-v2-12b_

h2ogpt-oig-oasst1-512-6.9b

df_scores_500_500_1234_False_h2ogpt-oig-oasst1-512-6 9b_

Lesson: WizardLM is great https://github.com/h2oai/h2ogpt/issues/96

arnocandel commented 1 year ago

h2ogpt-oasst1-falcon-40b.sharegpt.log df_scores_500_500_1234_False_h2ogpt-oasst1-falcon-40b_

h2ogpt-oig-oasst1-falcon-40b.sharegpt.log df_scores_500_500_1234_False_h2ogpt-oig-oasst1-falcon-40b_