openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.36k stars 2.55k forks source link

Evaluation on computer vision benchmarks #235

Open finitearth opened 1 year ago

finitearth commented 1 year ago

Are there plans to evaluate the vision modality of GPT-4? I am interested to know how GPT-4 could perform on classification tasks with 0- and few-shot-learning and how it compares to vision-only models. If the few-shot-learning capabilities of LLMs translate to other modalities, this would be a real game changer.

Question out of curiosity: How was the vision-modality incorperated? Maybe similar approaches can be taken for other modalities, such as audio or video? Would be an interessting Open-Source project for sure :)

MoreTore commented 1 year ago

I have an engineering exam bank of about 1000 questions with simple illustrations. I have the questions already in JSONL format but some of them rely on the image to answer correctly.

jwang47 commented 1 year ago

Currently our API doesn't support vision, but if it does we'll definitely add support for that to this framework!