Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
Currently any OpenAIClient model that has VISION_LANGUAGE_MODEL_TAG cannot accept text prompts only. This is because OpenAIClient previously assumed that every model was either a text model or a VLM, but not both. This is no longer true: gpt-4-turbo-2024-04-09 and gpt-4o-2024-05-13 support both image and text inputs.
Example failure:
File "/.../helm/src/helm/clients/openai_client.py", line 299, in make_request
return self._make_chat_request(request)
File "/.../helm/src/helm/clients/openai_client.py", line 170, in _make_chat_request
cache_key = self._get_cache_key(raw_request, request)
File "/.../helm/src/helm/clients/openai_client.py", line 64, in _get_cache_key
assert request.multimodal_prompt is not None
Currently any
OpenAIClient
model that hasVISION_LANGUAGE_MODEL_TAG
cannot accept text prompts only. This is becauseOpenAIClient
previously assumed that every model was either a text model or a VLM, but not both. This is no longer true:gpt-4-turbo-2024-04-09
andgpt-4o-2024-05-13
support both image and text inputs.Example failure: