Azure AI Studio Serverless as LM?

sglickman commented 1 month ago

Azure AI studio allow certain models to be used as serverless deployments. In this mode, you have a target URI (e.g., https://Meta-Llama-3-1-405B-Instruct-cyy.eastus.models.ai.azure.com ) and a key, and actual consumption looks like this:

pip install azure-ai-inference
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

api_key = os.getenv("AZURE_INFERENCE_CREDENTIAL", '')
if not api_key:
  raise Exception("A key should be provided to invoke the endpoint")

client = ChatCompletionsClient(
    endpoint='https://Meta-Llama-3-1-405B-Instruct-cyy.eastus.models.ai.azure.com',
    credential=AzureKeyCredential(api_key)
)

model_info = client.get_model_info()
print("Model name:", model_info.model_name)
print("Model type:", model_info.model_type)
print("Model provider name:", model_info.model_provider_name)

payload = {
  "messages": [
    {
      "role": "user",
      "content": "I am going to Paris, what should I see?"
    },
    {
      "role": "assistant",
      "content": "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."
    },
    {
      "role": "user",
      "content": "What is so great about #1?"
    }
  ],
  "max_tokens": 4096,
  "temperature": 0.8,
  "top_p": 0.1,
  "presence_penalty": 0
}
response = client.complete(payload)

print("Response:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print(" Prompt tokens:", response.usage.prompt_tokens)
print(" Total tokens:", response.usage.total_tokens)
print(" Completion tokens:", response.usage.completion_tokens)

Does DSPy support using something like this as the LM? I tried to follow the Custom LM Client directions but received the following error

Cell In[15], [line 100](vscode-notebook-cell:?execution_count=15&line=100)
     [96](vscode-notebook-cell:?execution_count=15&line=96) # generate_answer = dspy.ChainOfThought(BasicQA)
     [98](vscode-notebook-cell:?execution_count=15&line=98) predict = dspy.Predict("question -> answer")
--> [100](vscode-notebook-cell:?execution_count=15&line=100) prediction = predict(question="Who scored the final goal in football world cup finals in 2014?")
    [101](vscode-notebook-cell:?execution_count=15&line=101) prediction.answer

File ~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:99, in Predict.__call__(self, **kwargs)
     [98](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:98) def __call__(self, **kwargs):
---> [99](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:99)     return self.forward(**kwargs)

File ~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:116, in Predict.forward(self, **kwargs)
    [114](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:114) # If temperature is 0.0 but its n > 1, set temperature to 0.7.
    [115](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:115) temperature = config.get("temperature")
--> [116](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:116) temperature = lm.kwargs["temperature"] if temperature is None else temperature
    [117](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:117) num_generations = config.get("n") or lm.kwargs.get("n") or lm.kwargs.get("num_generations") or 1
    [119](https://file+.vscode-resource.vscode-cdn.net/Users/seth/Development/Biome/dspy-poc/~/.pyenv/versions/3.10.4/envs/preprocessor_env/envs/3.11/envs/dspy-tut/lib/python3.11/site-packages/dspy/predict/predict.py:119) if (temperature is None or temperature <= 0.15) and num_generations > 1:

AttributeError: 'AzureLlamaClient' object has no attribute 'kwargs'

Full code below -- am I doing something wrong?

sglickman commented 1 month ago

import dspy
from dspy import LM
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from dotenv import dotenv_values, load_dotenv

load_dotenv()

config=dotenv_values(".env")
azure_endpoint = config['AZURE_LLAMA_31_8B_ENDPOINT']
azure_key = config['AZURE_LLAMA_31_8B_KEY']

class Llama31_8B(LM):
  def __init__(self):
    self.provider = "default"
    self.history = []
    self.api_key = azure_key
    self.base_url = azure_endpoint
    self.client = ChatCompletionsClient(
      endpoint=azure_endpoint,
      credential=AzureKeyCredential(azure_key)
    )

  def basic_request(self, prompt: str, **kwargs):
    kwargs = {**self.kwargs, **kwargs}

    data = {
      **kwargs,
      "messages": [
        {"role": "user", "content": prompt}
      ],
      "max_tokens": 4096,
      "temperature": 0.8,
      "top_p": 0.1,
      "presence_penalty": 0
    }
    response = self.client.complete(data)
    self.history.append({
      'prompt': prompt,
      'response': response,
      'kwargs': kwargs
    })
    return response

  def __call__(self, prompt, only_completed=True, return_sorted=False, **kwargs):
    response = self.request(prompt, **kwargs)
    return response

llama = Llama31_8B()
dspy.settings.configure(lm=llama)

predict = dspy.Predict("question -> answer")

prediction = predict(question="What is the capital of France?")
prediction.answer

okhat commented 1 month ago

Thanks @sglickman ! Check this: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

stanfordnlp / dspy

Azure AI Studio Serverless as LM? #1573