simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.33k stars 237 forks source link

OpenAI default plugin should support registering additional models #107

Closed simonw closed 1 year ago

simonw commented 1 year ago

While thinking about:

I realized that there's a limitation hard-coded into LLM here: https://github.com/simonw/llm/blob/3f1388a4e6c8b951bb7e2f8a5ac6b6e08e99b873/llm/default_plugins/openai_models.py#L13-L18

What if OpenAI release a new model with a new name, like they did with gpt-4-0613 - at the moment, there's no way to use that in LLM without releasing a new version of the software (or using a custom plugin).

simonw commented 1 year ago

Potential solution: the OpenAI default plugin could support a $USER_DIR/openai-extra-models.yml file which looks something like this:

- model_id: gpt-4-0613
  name: gpt-4-0613
  aliases: ["4-0613"]

It could then extend to support external models with compatible APIs like this:

- model_id: your-model
  name: your-model.bin
  OPENAI_API_BASE: "http://localhost:8080/"
simonw commented 1 year ago

Relevant code from the OpenAI CLI utility: https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/_openai_scripts.py#L62-L75

    openai.debug = True
    if args.api_key is not None:
        openai.api_key = args.api_key
    if args.api_base is not None:
        openai.api_base = args.api_base
    if args.organization is not None:
        openai.organization = args.organization
    if args.proxy is not None:
        openai.proxy = {}
        for proxy in args.proxy:
            if proxy.startswith('https'):
                openai.proxy['https'] = proxy
            elif proxy.startswith('http'):
                openai.proxy['http'] = proxy
simonw commented 1 year ago

I don't like how those look like global variables on openai. when I want to be able to use these APIs in a threaded web environment which might have multiple calls happening at the same time - so changes made to openai.api_base need to not affect other prompts happening at the same time.

From this code it looks like there's a way to avoid that:

https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/api_requestor.py#L128-L145

class APIRequestor:
    def __init__(
        self,
        key=None,
        api_base=None,
        api_type=None,
        api_version=None,
        organization=None,
    ):
        self.api_base = api_base or openai.api_base
        self.api_key = key or util.default_api_key()
        self.api_type = (
            ApiType.from_str(api_type)
            if api_type
            else ApiType.from_str(openai.api_type)
        )
        self.api_version = api_version or openai.api_version
        self.organization = organization or openai.organization

Which lead me to: https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/openai_object.py#L11-L39

class OpenAIObject(dict):
    api_base_override = None

    def __init__(
        self,
        id=None,
        api_key=None,
        api_version=None,
        api_type=None,
        organization=None,
        response_ms: Optional[int] = None,
        api_base=None,
        engine=None,
        **params,
    ):
        super(OpenAIObject, self).__init__()

        if response_ms is not None and not isinstance(response_ms, int):
            raise TypeError(f"response_ms is a {type(response_ms).__name__}.")
        self._response_ms = response_ms

        self._retrieve_params = params

        object.__setattr__(self, "api_key", api_key)
        object.__setattr__(self, "api_version", api_version)
        object.__setattr__(self, "api_type", api_type)
        object.__setattr__(self, "organization", organization)
        object.__setattr__(self, "api_base_override", api_base)
        object.__setattr__(self, "engine", engine)

And ChatCompletion is a subclass of a subclass of that, so I think I should be able to pass those arguments to the ChatCompletion constructor.

simonw commented 1 year ago

Yes, it looks like ChatCompletion.create() ends up here: https://github.com/openai/openai-python/blob/b82a3f7e4c462a8a10fa445193301a3cefef9a4a/openai/api_resources/abstract/engine_api_resource.py#L127-L151

    @classmethod
    def create(
        cls,
        api_key=None,
        api_base=None,
        api_type=None,
        request_id=None,
        api_version=None,
        organization=None,
        **params,
    ):
simonw commented 1 year ago

The structure of the stream of chunks that comes back from LocalAI isn't quite the same as the OpenAI API - it looks like this:

{
  "object": "chat.completion.chunk",
  "model": "orca-mini-3b.ggmlv3",
  "choices": [
    {
      "delta": {
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}
{
  "object": "chat.completion.chunk",
  "model": "orca-mini-3b.ggmlv3",
  "choices": [
    {
      "delta": {
        "content": " Hello"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Then at the end:

{
  "object": "chat.completion.chunk",
  "model": "orca-mini-3b.ggmlv3",
  "choices": [
    {
      "finish_reason": "stop",
      "delta": {}
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

This doesn't fit the expected shape, especially for this code: https://github.com/simonw/llm/blob/58d1f9291d44dd5fd64b904484172309795aa0a8/llm/default_plugins/openai_models.py#L202-L224

It's missing the id field and created field, and some records are missing thefinish_reason` key.

simonw commented 1 year ago

This almost works:

diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index c6d74c5..002f899 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -8,6 +8,7 @@ from pydantic import field_validator, Field
 import requests
 from typing import List, Optional, Union
 import json
+import yaml

 @hookimpl
@@ -16,6 +17,21 @@ def register_models(register):
     register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
     register(Chat("gpt-4"), aliases=("4", "gpt4"))
     register(Chat("gpt-4-32k"), aliases=("4-32k",))
+    # Load extra models
+    extra_path = llm.user_dir() / "extra-openai-models.yaml"
+    if not extra_path.exists():
+        return
+    with open(extra_path) as f:
+        extra_models = yaml.safe_load(f)
+    for model in extra_models:
+        model_id = model["model_id"]
+        aliases = model.get("aliases", [])
+        model_name = model["model_name"]
+        api_base = model.get("api_base")
+        register(
+            Chat(model_id, model_name=model_name, api_base=api_base),
+            aliases=aliases,
+        )

 @hookimpl
@@ -141,9 +157,11 @@ class Chat(Model):

             return validated_logit_bias

-    def __init__(self, model_id, key=None):
+    def __init__(self, model_id, key=None, model_name=None, api_base=None):
         self.model_id = model_id
         self.key = key
+        self.model_name = model_name
+        self.api_base = api_base

     def __str__(self):
         return "OpenAI Chat: {}".format(self.model_id)
@@ -169,13 +187,17 @@ class Chat(Model):
             messages.append({"role": "system", "content": prompt.system})
         messages.append({"role": "user", "content": prompt.prompt})
         response._prompt_json = {"messages": messages}
+        kwargs = dict(not_nulls(prompt.options))
+        if self.api_base:
+            kwargs["api_base"] = self.api_base
+        if self.key:
+            kwargs["api_key"] = self.key
         if stream:
             completion = openai.ChatCompletion.create(
-                model=prompt.model.model_id,
+                model=self.model_name or self.model_id,
                 messages=messages,
                 stream=True,
-                api_key=self.key,
-                **not_nulls(prompt.options),
+                **kwargs,
             )
             chunks = []
             for chunk in completion:
@@ -186,10 +208,10 @@ class Chat(Model):
             response.response_json = combine_chunks(chunks)
         else:
             completion = openai.ChatCompletion.create(
-                model=prompt.model.model_id,
+                model=self.model_name or self.model_id,
                 messages=messages,
-                api_key=self.key,
                 stream=False,
+                **kwargs,
             )
             response.response_json = completion.to_dict_recursive()
             yield completion.choices[0].message.content
@@ -209,11 +231,11 @@ def combine_chunks(chunks: List[dict]) -> dict:
                 role = choice["delta"]["role"]
             if "content" in choice["delta"]:
                 content += choice["delta"]["content"]
-            if choice["finish_reason"] is not None:
+            if choice.get("finish_reason") is not None:
                 finish_reason = choice["finish_reason"]

     return {
-        "id": chunks[0]["id"],
+        "id": chunks[0].get("id") or "no-id",
         "object": chunks[0]["object"],
         "model": chunks[0]["model"],
         "created": chunks[0]["created"],

I put this in /Users/simon/Library/Application Support/io.datasette.llm/extra-openai-models.yaml:

- model_id: orca-openai-compat
  model_name: orca-mini-3b.ggmlv3
  api_base: "http://localhost:8080"

Then ran this:

llm -m 'orca-openai-compat' 'Say hello in french'

And got back:

Hello!

To complete the request, I will need to practice speaking French for a bit and become familiar with common greetings. Additionally, it would be helpful to have a copy of the French language dictionary nearby to look up some common vocabulary words that may not be familiar to someone who is just starting out in learning a new language.Error: 'created'

french

simonw commented 1 year ago

Error: 'created' is because of the API not having the created field. Need to fix that.

simonw commented 1 year ago

Idea: a command which hits the models API for a custom endpoint and writes out a cached file recording those models so they can show up automatically as registered models.

Example from LocalAI:

curl http://localhost:8080/v1/models | jq
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "orca-mini-3b.ggmlv3",
      "object": "model"
    }
  ]
}
simonw commented 1 year ago

Maybe it's OK to hit that endpoint every time the LLM command runs rather than messing around with caching.

I don't want to hit that endpoint URL on localhost every time I use llm for regular ChatGPT though.

simonw commented 1 year ago

Got this working.

In /Users/simon/Library/Application Support/io.datasette.llm/extra-openai-models.yaml:

- model_id: orca-openai-compat
  model_name: orca-mini-3b.ggmlv3
  api_base: "http://localhost:8080"

Then:

llm -m orca-openai-compat '3 names for a pet cow'
 I can do that! Here are three different names for a pet cow: 
1. Milo 2. Daisy 3. Max
llm -c '2 more with descriptions'
 Thank you for your prompt service! Here are two more options for a pet cow's name:

1. Lily - She's gentle and kind, just like a lily.
2. Thunder - He's strong and fierce, just like thunderstorms on a summer day.
llm logs -n 1
[
  {
    "id": "01h5d4nthj4ncdntz2ap56ffz5",
    "model": "orca-openai-compat",
    "prompt": "2 more with descriptions",
    "system": null,
    "prompt_json": {
      "messages": [
        {
          "role": "user",
          "content": "3 names for a pet cow"
        },
        {
          "role": "assistant",
          "content": " I can do that! Here are three different names for a pet cow: \n1. Milo 2. Daisy 3. Max"
        },
        {
          "role": "user",
          "content": "2 more with descriptions"
        }
      ]
    },
    "options_json": {},
    "response": " Thank you for your prompt service! Here are two more options for a pet cow's name:\n\n1. Lily - She's gentle and kind, just like a lily.\n2. Thunder - He's strong and fierce, just like thunderstorms on a summer day.",
    "response_json": {
      "content": " Thank you for your prompt service! Here are two more options for a pet cow's name:\n\n1. Lily - She's gentle and kind, just like a lily.\n2. Thunder - He's strong and fierce, just like thunderstorms on a summer day.",
      "role": "assistant",
      "finish_reason": "stop",
      "object": "chat.completion.chunk",
      "model": "orca-mini-3b.ggmlv3"
    },
    "conversation_id": "01h5d4my74mqyjxc24fhcf86ry",
    "duration_ms": 8729,
    "datetime_utc": "2023-07-15T16:03:17.655636",
    "conversation_name": "3 names for a pet cow",
    "conversation_model": "orca-openai-compat"
  }
]
simonw commented 1 year ago

Needs documentation and tests.

simonw commented 1 year ago

Documentation: https://llm.datasette.io/en/latest/other-models.html#adding-more-openai-models