Closed simonw closed 1 year ago
The thing I'm having trouble with here is what to call this.
I want a command similar to llm openai ...
but for this family of models.
Naming options:
llm google
llm vertex
(does anyone know what Vertex is?)llm palm
(but should this be palm2
and will there be non-palm models in here?)llm bison
(again, not sure people understand the terminology)There are a ton of other models available in the Vertex "model garden": https://console.cloud.google.com/vertex-ai/model-garden - T5-FLAN, Stable Diffusion, BLIP, all sorts of things.
It's so very confusing in there! Many of them don't seem to have HTTP API endpoints - some appear to be available only via a notebook interface.
I'm tempted to go with llm google
purely for consistency with llm openai
- then maybe llm anthropic
can follow?
Maybe the vendors themselves are a distraction - the thing that matters is the model. I've kind of broken this already though by having GPT-4 as a -4
flag on what I initially called the chatgpt
command.
For PaLM 2 itself I think the models available to me are text-bison
and code-bison
and code-gecko
.
I'm going to land a llm palm2
command for the moment, then go back to this issue and reconsider:
This looks useful: https://developers.generativeai.google/api/rest/generativelanguage/models/list
curl https://generativelanguage.googleapis.com/v1beta2/models?key=$PALM_API_KEY
I currently get this:
{
"models": [
{
"name": "models/chat-bison-001",
"version": "001",
"displayName": "Chat Bison",
"description": "Chat-optimized generative language model.",
"inputTokenLimit": 4096,
"outputTokenLimit": 1024,
"supportedGenerationMethods": [
"generateMessage"
],
"temperature": 0.25,
"topP": 0.95,
"topK": 40
},
{
"name": "models/text-bison-001",
"version": "001",
"displayName": "Text Bison",
"description": "Model targeted for text generation.",
"inputTokenLimit": 8196,
"outputTokenLimit": 1024,
"supportedGenerationMethods": [
"generateText"
],
"temperature": 0.7,
"topP": 0.95,
"topK": 40
},
{
"name": "models/embedding-gecko-001",
"version": "001",
"displayName": "Embedding Gecko",
"description": "Obtain a distributed representation of a text.",
"inputTokenLimit": 1024,
"outputTokenLimit": 1,
"supportedGenerationMethods": [
"embedText"
]
}
]
}
So no code-bison
or code-gecko
listed there.
https://developers.generativeai.google/api/rest/generativelanguage/models/countMessageTokens can count tokens:
curl https://generativelanguage.googleapis.com/v1beta2/models/chat-bison-001:countMessageTokens?key=$PALM_API_KEY \
-H 'Content-Type: application/json' \
-X POST \
-d '{
"prompt": {
"messages": [
{"content":"How many tokens?"},
{"content": "For this whole conversation?" }
]
}
}'
{
"tokenCount": 23
}
I'm just going to ship text-bison-001
for the moment. I'm going to call the command palm2
.
Where should it get the API token from? The way I do it for OpenAI tokens right now is bad and needs fixing:
https://github.com/simonw/llm/blob/293f306a5e9855a0828f98293abbccf968810a6e/llm/cli.py#L162-L173
My code so far:
@cli.command()
@click.argument("prompt", required=False)
@click.option("-m", "--model", help="Model to use", default="text-bison-001")
@click.option("-n", "--no-log", is_flag=True, help="Don't log to database")
def palm2(prompt, model, no_log):
"Execute a prompt against a PaLM 2 model"
if prompt is None:
# Read from stdin instead
prompt = sys.stdin.read()
api_key = get_vertex_api_key()
url = "https://generativelanguage.googleapis.com/v1beta2/models/text-bison-001:generateText?key={}".format(
api_key
)
response = requests.post(
url,
json={"prompt": {"text": prompt}},
headers={"Content-Type": "application/json"},
)
output = response.json()["candidates"][0]["output"]
log(no_log, "vertex", None, prompt, output, model)
click.echo(output)
if prompt is None:
# Read from stdin instead
prompt = sys.stdin.read()
This poses a problem with the UX. This is the reason why llm
happens to hang indefinitely when no argument is passed.
I suggested a fix for this in PR#19.
if prompt is None: # Read from stdin instead prompt = sys.stdin.read()
This poses a problem with the UX. This is the reason why
llm
happens to hang indefinitely when no argument is passed.
That's a deliberate design decision at the moment - it means you can run llm
and then copy-and-paste text into your terminal.
There are other common unix commands that work like this - cat
and wc
for example - so I'm not convinced it's a usability problem. Happy to hear further discussion around that though.
Since it's possible to detect this situation, perhaps a message to stderr reminding the user to type or paste in content and hit Ctrl+D
when they are done would be appropriate? I've not seen any other commands that do that though.
That's a deliberate design decision at the moment - it means you can run llm and then copy-and-paste text into your terminal.
I'm not sure to understand. When I run llm
, no matter what I paste in my terminal, it just keeps waiting. Even if type "say hello" and press Enter
.
Oh... my bad! Okay, I hit Ctrl + D
and it responded. It's not very intuitive—though, I may be alone in that category.
As of now, I would find it better to print the helper message by default, as I suggested in my pull request. However, if an instruction can be shown to the user via stderr
, it might solve what I believe is an UX issue.
Now that I've renamed llm chatgpt
to llm prompt
I'm going to try adding this model to that command instead, so you would use it like so:
llm -m palm2 "Five surprising names for a wise owl"
I'll support -m text-bison
as well.
In that new prototype branch:
% llm 'Two names for a beaver' -m palm2
Daggett and Dasher
Figuring out chat mode for Vertex/PaLM2 is proving hard.
https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/1?project=cloud-vision-ocr-382418 talks about "PaLM 2 for Chat".
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/chat.py seems to be the most relevant example code:
from vertexai.preview.language_models import ChatModel, InputOutputTextPair
def science_tutoring(temperature: float = 0.2) -> None:
chat_model = ChatModel.from_pretrained("chat-bison@001")
parameters = {
"temperature": temperature, # Temperature controls the degree of randomness in token selection.
"max_output_tokens": 256, # Token limit determines the maximum amount of text output.
"top_p": 0.95, # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
"top_k": 40, # A top_k of 1 means the selected token is the most probable among all tokens.
}
chat = chat_model.start_chat(
context="My name is Miles. You are an astronomer, knowledgeable about the solar system.",
examples=[
InputOutputTextPair(
input_text="How many moons does Mars have?",
output_text="The planet Mars has two moons, Phobos and Deimos.",
),
],
)
response = chat.send_message(
"How many planets are there in the solar system?", **parameters
)
print(f"Response from Model: {response.text}")
return response
I think this is where vertexai
comes from: https://github.com/googleapis/python-aiplatform - pip install google-cloud-aiplatform
Buried deep in a class hierarchy, this looks like the code that actually constructs the JSON to call the API: https://github.com/googleapis/python-aiplatform/blob/c60773a7db8ce7a59d2cb5787dc90937776c0b8f/vertexai/language_models/_language_models.py#L697-L824
The API call then goes through this code:
prediction_response = self._model._endpoint.predict(
instances=[prediction_instance],
parameters=prediction_parameters,
)
I have not yet tracked down that self._model._endpoint.predict()
method.
Some useful hints in https://github.com/google/generative-ai-docs/blob/main/site/en/tutorials/chat_quickstart.ipynb - including that PaLM 2 has a "context" concept which appears to be the same thing as an OpenAI system prompt:
reply = palm.chat(context="Speak like Shakespeare.", messages='Hello')
print(reply.last)
Hello there, my good fellow! How fares thee this day?
reply = palm.chat(
context="Answer everything with a haiku, following the 5/7/5 rhyme pattern.",
messages="How's it going?"
)
print(reply.last)
I am doing well
I am learning and growing
Every day is new
Based on that example notebook, I'm going to ditch the terminology "Vertex" and "PaLM 2" and just call it "PaLM". (They never released an API for PaLM 1).
I'm also going to move my code out of the experimental plugin and into a llm-palm
package which depends on google-generativeai
.
Got this out of the debugger, after this:
import google.generativeai as palm
kwargs = {"messages": self.prompt.prompt}
if self.prompt.system:
kwargs["context"] = self.prompt.system
response = palm.chat(**kwargs)
last = response.last
(Pdb) pprint(response.to_dict())
{'candidate_count': None,
'candidates': [{'author': '1',
'content': 'Here are three names for a dog:\n'
'\n'
'1. **Bailey** is a popular name for both male and '
'female dogs. It is of English origin and means '
'"bailiff" or "steward." Bailey is a friendly and '
'loyal dog that is always up for a good time.\n'
'2. **Luna** is a Latin name that means "moon." It '
'is a popular name for female dogs, but it can '
'also be used for male dogs. Luna is a beautiful '
'and intelligent dog that is always curious about '
'the world around her.\n'
'3. **Max** is a German name that means '
'"greatest." It is a popular name for male dogs, '
'but it can also be used for female dogs. Max is a '
'strong and courageous dog that is always willing '
'to protect his family.\n'
'\n'
'These are just a few of the many great names that '
'you could choose for your new dog. When choosing '
"a name, it is important to consider your dog's "
'personality and appearance. You should also '
'choose a name that you will be happy saying for '
'many years to come.'}],
'context': '',
'examples': [],
'messages': [{'author': '0', 'content': 'three names for a dog'},
{'author': '1',
'content': 'Here are three names for a dog:\n'
'\n'
'1. **Bailey** is a popular name for both male and '
'female dogs. It is of English origin and means '
'"bailiff" or "steward." Bailey is a friendly and '
'loyal dog that is always up for a good time.\n'
'2. **Luna** is a Latin name that means "moon." It '
'is a popular name for female dogs, but it can also '
'be used for male dogs. Luna is a beautiful and '
'intelligent dog that is always curious about the '
'world around her.\n'
'3. **Max** is a German name that means "greatest." '
'It is a popular name for male dogs, but it can also '
'be used for female dogs. Max is a strong and '
'courageous dog that is always willing to protect '
'his family.\n'
'\n'
'These are just a few of the many great names that '
'you could choose for your new dog. When choosing a '
"name, it is important to consider your dog's "
'personality and appearance. You should also choose '
'a name that you will be happy saying for many years '
'to come.'}],
'model': 'models/chat-bison-001',
'temperature': None,
'top_k': None,
'top_p': None}
That library doesn't have streaming support yet, issues here:
The context
(PaLM's version of the system prompt) isn't very strongly held to:
Here's GPT-4:
% llm -m 4 'three names for a pet pelican, succinctly' --system 'translate to french'
trois noms pour un pélican domestique, succinctement
PaLM messes that one up:
% llm -m palm 'three names for a pet pelican, succinctly' --system 'translate to french'
{'model': 'models/chat-bison-001', 'context': 'translate to french', 'examples': [], 'messages': [{'author': '0', 'content': 'three names for a pet pelican, succinctly'}, {'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}]}
Here are three names for a pet pelican:
* Pete
* Piper
* Peanut
% llm -m palm 'three names for a pet pelican, succinctly' --system 'you are a bot that translates everything to french'
{'model': 'models/chat-bison-001', 'context': 'you are a bot that translates everything to french', 'examples': [], 'messages': [{'author': '0', 'content': 'three names for a pet pelican, succinctly'}, {'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Here are three names for a pet pelican:\n\n* Pete\n* Piper\n* Peanut'}]}
Here are three names for a pet pelican:
* Pete
* Piper
* Peanut
This example from the PaLM example notebook does work though:
% llm -m palm 'Hello' --system 'Speak like Shakespeare'
{'model': 'models/chat-bison-001', 'context': 'Speak like Shakespeare', 'examples': [], 'messages': [{'author': '0', 'content': 'Hello'}, {'author': '1', 'content': 'Hello there, my good fellow! How fares thee on this fine day?'}], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': [{'author': '1', 'content': 'Hello there, my good fellow! How fares thee on this fine day?'}]}
Hello there, my good fellow! How fares thee on this fine day?
This was surprising:
llm -m palm --system "Translate to french" "I like pelicans a lot"
{'model': 'models/chat-bison-001', 'context': 'Translate to french', 'examples': [], 'messages': [{'author': '0', 'content': 'I like pelicans a lot'}, None], 'temperature': None, 'candidate_count': None, 'top_p': None, 'top_k': None, 'candidates': []}
I got back a None
where I expected a message - and response.last
return None
too.
Extracted this out to a separate plugin: https://github.com/simonw/llm-palm
Closing this issue - future work will happen there instead.
I have access now. I managed to get an API key I can use with the text-bison-001 model via Google Vertex AI.
https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/text-bison
The API call looks like this: