Azure OpenAI Client - Githubissues

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).

https://crfm.stanford.edu/helm

Apache License 2.0

1.8k stars 239 forks source link

Azure OpenAI Client #2682

Closed jasonwright38 closed 2 months ago

jasonwright38 commented 2 months ago

Created AzureOpenAIClient class which enables us to run HELM against OpenAI models hosted in Azure

Tested against llm-benchmarking Azure OpenAI deployment (GPT-3.5-turbo)
Benchmarked mc-defence-qa scenario against model hosted in Azure to test this client is working (full datasets - 238 samples)
Number of TODO items within the class to address before this work is complete

How to use the AzureOpenAIClient to run HELM benchmarks:

Add model_deployments.yaml & model_metadata.yaml files in prod_env/ directory (see repo on Azure VM for example)
export the following environment variables: AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_DEPLOYMENT_NAME
Run 'helm-run', ensuring that the model specified in 'run_entries.conf' matches the model name specified in 'model_deployments.yaml' - (azure/gpt-35-turbo-0301 for my tests so far)

This will run benchmarks as normal, using an OpenAI model hosted in Azure.