Add support for Google's PaLM 2

simonw commented 1 year ago

I have access now. I managed to get an API key I can use with the text-bison-001 model via Google Vertex AI.

https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/text-bison

The API call looks like this:

url = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText?key={}".format(
    api_key
)
response = requests.post(
    url,
    json={"prompt": {"text": prompt}},
    headers={"Content-Type": "application/json"},
)
output = response.json()["candidates"][0]["output"]

simonw commented 1 year ago

The thing I'm having trouble with here is what to call this.

I want a command similar to llm openai ... but for this family of models.

Naming options:

llm google
llm vertex (does anyone know what Vertex is?)
llm palm (but should this be palm2 and will there be non-palm models in here?)
llm bison (again, not sure people understand the terminology)

There are a ton of other models available in the Vertex "model garden": https://console.cloud.google.com/vertex-ai/model-garden - T5-FLAN, Stable Diffusion, BLIP, all sorts of things.

It's so very confusing in there! Many of them don't seem to have HTTP API endpoints - some appear to be available only via a notebook interface.

simonw commented 1 year ago

I'm tempted to go with llm google purely for consistency with llm openai - then maybe llm anthropic can follow?

Maybe the vendors themselves are a distraction - the thing that matters is the model. I've kind of broken this already though by having GPT-4 as a -4 flag on what I initially called the chatgpt command.

simonw commented 1 year ago

For PaLM 2 itself I think the models available to me are text-bison and code-bison and code-gecko.

simonw commented 1 year ago

I'm going to land a llm palm2 command for the moment, then go back to this issue and reconsider:

17

simonw commented 1 year ago

This looks useful: https://developers.generativeai.google/api/rest/generativelanguage/models/list

curl https://generativelanguage.googleapis.com/v1beta2/models?key=$PALM_API_KEY

I currently get this:

{
  "models": [
    {
      "name": "models/chat-bison-001",
      "version": "001",
      "displayName": "Chat Bison",
      "description": "Chat-optimized generative language model.",
      "inputTokenLimit": 4096,
      "outputTokenLimit": 1024,
      "supportedGenerationMethods": [
        "generateMessage"
      ],
      "temperature": 0.25,
      "topP": 0.95,
      "topK": 40
    },
    {
      "name": "models/text-bison-001",
      "version": "001",
      "displayName": "Text Bison",
      "description": "Model targeted for text generation.",
      "inputTokenLimit": 8196,
      "outputTokenLimit": 1024,
      "supportedGenerationMethods": [
        "generateText"
      ],
      "temperature": 0.7,
      "topP": 0.95,
      "topK": 40
    },
    {
      "name": "models/embedding-gecko-001",
      "version": "001",
      "displayName": "Embedding Gecko",
      "description": "Obtain a distributed representation of a text.",
      "inputTokenLimit": 1024,
      "outputTokenLimit": 1,
      "supportedGenerationMethods": [
        "embedText"
      ]
    }
  ]
}

So no code-bison or code-gecko listed there.

simonw commented 1 year ago

https://developers.generativeai.google/api/rest/generativelanguage/models/countMessageTokens can count tokens:

curl https://generativelanguage.googleapis.com/v1beta2/models/chat-bison-001:countMessageTokens?key=$PALM_API_KEY \
    -H 'Content-Type: application/json' \
    -X POST \
    -d '{
        "prompt": {
            "messages": [
                {"content":"How many tokens?"},
                {"content": "For this whole conversation?" }
            ]
        }
    }'

{
  "tokenCount": 23
}

simonw commented 1 year ago

I'm just going to ship text-bison-001 for the moment. I'm going to call the command palm2.

simonw commented 1 year ago

Where should it get the API token from? The way I do it for OpenAI tokens right now is bad and needs fixing:

13

https://github.com/simonw/llm/blob/293f306a5e9855a0828f98293abbccf968810a6e/llm/cli.py#L162-L173

simonw commented 1 year ago

My code so far:


@cli.command()
@click.argument("prompt", required=False)
@click.option("-m", "--model", help="Model to use", default="text-bison-001")
@click.option("-n", "--no-log", is_flag=True, help="Don't log to database")
def palm2(prompt, model, no_log):
    "Execute a prompt against a PaLM 2 model"
    if prompt is None:
        # Read from stdin instead
        prompt = sys.stdin.read()
    api_key = get_vertex_api_key()
    url = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText?key={}".format(
        api_key
    )
    response = requests.post(
        url,
        json={"prompt": {"text": prompt}},
        headers={"Content-Type": "application/json"},
    )
    output = response.json()["candidates"][0]["output"]
    log(no_log, "vertex", None, prompt, output, model)
    click.echo(output)

sderev commented 1 year ago

    if prompt is None:
        # Read from stdin instead
        prompt = sys.stdin.read()

This poses a problem with the UX. This is the reason why llm happens to hang indefinitely when no argument is passed.

I suggested a fix for this in PR#19.

simonw commented 1 year ago

    if prompt is None:
        # Read from stdin instead
        prompt = sys.stdin.read()
This poses a problem with the UX. This is the reason why llm happens to hang indefinitely when no argument is passed.

That's a deliberate design decision at the moment - it means you can run llm and then copy-and-paste text into your terminal.

There are other common unix commands that work like this - cat and wc for example - so I'm not convinced it's a usability problem. Happy to hear further discussion around that though.

Since it's possible to detect this situation, perhaps a message to stderr reminding the user to type or paste in content and hit Ctrl+D when they are done would be appropriate? I've not seen any other commands that do that though.

sderev commented 1 year ago

That's a deliberate design decision at the moment - it means you can run llm and then copy-and-paste text into your terminal.

I'm not sure to understand. When I run llm, no matter what I paste in my terminal, it just keeps waiting. Even if type "say hello" and press Enter.

Oh... my bad! Okay, I hit Ctrl + D and it responded. It's not very intuitive—though, I may be alone in that category.

As of now, I would find it better to print the helper message by default, as I suggested in my pull request. However, if an instruction can be shown to the user via stderr, it might solve what I believe is an UX issue.

simonw commented 1 year ago

Now that I've renamed llm chatgpt to llm prompt I'm going to try adding this model to that command instead, so you would use it like so:

llm -m palm2 "Five surprising names for a wise owl"

I'll support -m text-bison as well.

simonw commented 1 year ago

In that new prototype branch:

% llm 'Two names for a beaver' -m palm2
Daggett and Dasher

simonw commented 1 year ago

Figuring out chat mode for Vertex/PaLM2 is proving hard.

https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/1?project=cloud-vision-ocr-382418 talks about "PaLM 2 for Chat".

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/chat.py seems to be the most relevant example code:

from vertexai.preview.language_models import ChatModel, InputOutputTextPair

def science_tutoring(temperature: float = 0.2) -> None:
    chat_model = ChatModel.from_pretrained("chat-bison@001")
    parameters = {
        "temperature": temperature,  # Temperature controls the degree of randomness in token selection.
        "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
        "top_p": 0.95,  # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
        "top_k": 40,  # A top_k of 1 means the selected token is the most probable among all tokens.
    }
    chat = chat_model.start_chat(
        context="My name is Miles. You are an astronomer, knowledgeable about the solar system.",
        examples=[
            InputOutputTextPair(
                input_text="How many moons does Mars have?",
                output_text="The planet Mars has two moons, Phobos and Deimos.",
            ),
        ],
    )
    response = chat.send_message(
        "How many planets are there in the solar system?", **parameters
    )
    print(f"Response from Model: {response.text}")
    return response

I think this is where vertexai comes from: https://github.com/googleapis/python-aiplatform - pip install google-cloud-aiplatform

simonw commented 1 year ago

Buried deep in a class hierarchy, this looks like the code that actually constructs the JSON to call the API: https://github.com/googleapis/python-aiplatform/blob/c60773a7db8ce7a59d2cb5787dc90937776c0b8f/vertexai/language_models/_language_models.py#L697-L824

The API call then goes through this code:

        prediction_response = self._model._endpoint.predict(
            instances=[prediction_instance],
            parameters=prediction_parameters,
        )

I have not yet tracked down that self._model._endpoint.predict() method.

simonw commented 1 year ago

Some useful hints in https://github.com/google/generative-ai-docs/blob/main/site/en/tutorials/chat_quickstart.ipynb - including that PaLM 2 has a "context" concept which appears to be the same thing as an OpenAI system prompt:

reply = palm.chat(context="Speak like Shakespeare.", messages='Hello')
print(reply.last)

Hello there, my good fellow! How fares thee this day?

reply = palm.chat(
    context="Answer everything with a haiku, following the 5/7/5 rhyme pattern.",
    messages="How's it going?"
)
print(reply.last)


I am doing well
I am learning and growing
Every day is new

simonw commented 1 year ago

Based on that example notebook, I'm going to ditch the terminology "Vertex" and "PaLM 2" and just call it "PaLM". (They never released an API for PaLM 1).

I'm also going to move my code out of the experimental plugin and into a llm-palm package which depends on google-generativeai.

simonw commented 1 year ago

Got this out of the debugger, after this:

import google.generativeai as palm
kwargs = {"messages": self.prompt.prompt}
if self.prompt.system:
    kwargs["context"] = self.prompt.system

response = palm.chat(**kwargs)
last = response.last

(Pdb) pprint(response.to_dict())
{'candidate_count': None,
 'candidates': [{'author': '1',
                 'content': 'Here are three names for a dog:\n'
                            '\n'
                            '1. **Bailey** is a popular name for both male and '
                            'female dogs. It is of English origin and means '
                            '"bailiff" or "steward." Bailey is a friendly and '
                            'loyal dog that is always up for a good time.\n'
                            '2. **Luna** is a Latin name that means "moon." It '
                            'is a popular name for female dogs, but it can '
                            'also be used for male dogs. Luna is a beautiful '
                            'and intelligent dog that is always curious about '
                            'the world around her.\n'
                            '3. **Max** is a German name that means '
                            '"greatest." It is a popular name for male dogs, '
                            'but it can also be used for female dogs. Max is a '
                            'strong and courageous dog that is always willing '
                            'to protect his family.\n'
                            '\n'
                            'These are just a few of the many great names that '
                            'you could choose for your new dog. When choosing '
                            "a name, it is important to consider your dog's "
                            'personality and appearance. You should also '
                            'choose a name that you will be happy saying for '
                            'many years to come.'}],
 'context': '',
 'examples': [],
 'messages': [{'author': '0', 'content': 'three names for a dog'},
              {'author': '1',
               'content': 'Here are three names for a dog:\n'
                          '\n'
                          '1. **Bailey** is a popular name for both male and '
                          'female dogs. It is of English origin and means '
                          '"bailiff" or "steward." Bailey is a friendly and '
                          'loyal dog that is always up for a good time.\n'
                          '2. **Luna** is a Latin name that means "moon." It '
                          'is a popular name for female dogs, but it can also '
                          'be used for male dogs. Luna is a beautiful and '
                          'intelligent dog that is always curious about the '
                          'world around her.\n'
                          '3. **Max** is a German name that means "greatest." '
                          'It is a popular name for male dogs, but it can also '
                          'be used for female dogs. Max is a strong and '
                          'courageous dog that is always willing to protect '
                          'his family.\n'
                          '\n'
                          'These are just a few of the many great names that '
                          'you could choose for your new dog. When choosing a '
                          "name, it is important to consider your dog's "
                          'personality and appearance. You should also choose '
                          'a name that you will be happy saying for many years '
                          'to come.'}],
 'model': 'models/chat-bison-001',
 'temperature': None,
 'top_k': None,
 'top_p': None}

simonw commented 1 year ago

That library doesn't have streaming support yet, issues here:

simonw commented 1 year ago

The context (PaLM's version of the system prompt) isn't very strongly held to:

Here's GPT-4:

% llm -m 4 'three names for a pet pelican, succinctly' --system 'translate to french'
trois noms pour un pélican domestique, succinctement

PaLM messes that one up:

% llm -m palm 'three names for a pet pelican, succinctly' --system 'translate to french'
{'model': 'models/chat-bison-001', 'context': 'translate to french', 'examples': [], 'messages': [{'author': '0', 'content': 'three names for a pet pelican, succinctly'}, {'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}]}
Here are three names for a pet pelican:

* Pete
* Piper
* Peanut
% llm -m palm 'three names for a pet pelican, succinctly' --system 'you are a bot that translates everything to french'
{'model': 'models/chat-bison-001', 'context': 'you are a bot that translates everything to french', 'examples': [], 'messages': [{'author': '0', 'content': 'three names for a pet pelican, succinctly'}, {'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}]}
Here are three names for a pet pelican:

* Pete
* Piper
* Peanut

This example from the PaLM example notebook does work though:

% llm -m palm 'Hello' --system 'Speak like Shakespeare'                                                                
{'model': 'models/chat-bison-001', 'context': 'Speak like Shakespeare', 'examples': [], 'messages': [{'author': '0', 'content': 'Hello'}, {'author': '1', 'content': 'Hello there, my good fellow! How fares thee on this fine day?'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Hello there, my good fellow! How fares thee on this fine day?'}]}
Hello there, my good fellow! How fares thee on this fine day?

simonw commented 1 year ago

This was surprising:

llm -m palm --system "Translate to french" "I like pelicans a lot"

{'model': 'models/chat-bison-001', 'context': 'Translate to french', 'examples': [], 'messages': [{'author': '0', 'content': 'I like pelicans a lot'}, None], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': []}

I got back a None where I expected a message - and response.last return None too.

simonw commented 1 year ago

Extracted this out to a separate plugin: https://github.com/simonw/llm-palm

Closing this issue - future work will happen there instead.

simonw / llm

Add support for Google's PaLM 2 #20

17

13