Closed simonw closed 1 year ago
I'll do this:
llm models list --options
And introspect the Options
class.
I got this working:
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
max_tokens: int - Maximum number of tokens to generate
top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
stop: str - A string where the API will stop generating further tokens.
logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
max_tokens: int - Maximum number of tokens to generate
top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
stop: str - A string where the API will stop generating further tokens.
logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
max_tokens: int - Maximum number of tokens to generate
top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
stop: str - A string where the API will stop generating further tokens.
logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
max_tokens: int - Maximum number of tokens to generate
top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
stop: str - A string where the API will stop generating further tokens.
logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
Markov: markov
length: int
delay: float
PaLM 2: chat-bison-001 (aliases: palm, palm2)
gpt4all: orca-mini-3b - Orca (Small), 1.80GB download, needs 4GB RAM (installed)
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM (installed)
gpt4all: orca-mini-7b - Orca, 3.53GB download, needs 8GB RAM (installed)
gpt4all: ggml-replit-code-v1-3b - Replit, 4.84GB download, needs 4GB RAM (installed)
gpt4all: ggml-vicuna-13b-1 - Vicuna (large), 7.58GB download, needs 16GB RAM (installed)
gpt4all: nous-hermes-13b - Hermes, 7.58GB download, needs 16GB RAM (installed)
gpt4all: ggml-model-gpt4all-falcon-q4_0 - GPT4All Falcon, 3.78GB download, needs 8GB RAM
gpt4all: ggml-vicuna-7b-1 - Vicuna, 3.92GB download, needs 8GB RAM
gpt4all: ggml-wizardLM-7B - Wizard, 3.92GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-base - MPT Base, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-instruct - MPT Instruct, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-chat - MPT Chat, 4.52GB download, needs 8GB RAM
gpt4all: orca-mini-13b - Orca (Large), 6.82GB download, needs 16GB RAM
gpt4all: GPT4All-13B-snoozy - Snoozy, 7.58GB download, needs 16GB RAM
gpt4all: ggml-nous-gpt4-vicuna-13b - Nous Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: ggml-stable-vicuna-13B - Stable Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
Mpt30b: mpt30b (aliases: mpt)
verbose: <class 'bool'>
I don't like it outputting the same help multiple times though. I'm going to have it only output the detailed descriptions once per model of each class.
Tests are failing now - docs/usage.md
is detected as changed by cog
on Python 3.8 for some reason.
Could be related to plugin order. It shouldn't though, I would expect this order to be the same each time with only the openai
default plugin installed: https://github.com/simonw/llm/blob/18f34b5df25f20afceaa6a85dbd55d2b6dd47cb3/llm/__init__.py#L49-L56
Could be the order of this bit: https://github.com/simonw/llm/blob/18f34b5df25f20afceaa6a85dbd55d2b6dd47cb3/llm/cli.py#L381
Actually the problem was something else. I copied the generated text to my local environment and did a diff
and got this:
diff --git a/docs/usage.md b/docs/usage.md
index 8e76010..dfc4c16 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -98,53 +98,53 @@ cog.out("```\n{}\n```".format(result.output))
OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
- temperature: float
+ temperature: Union[float, NoneType]
What sampling temperature to use, between 0 and 2. Higher values like
0.8 will make the output more random, while lower values like 0.2 will
make it more focused and deterministic.
- max_tokens: int
+ max_tokens: Union[int, NoneType]
Maximum number of tokens to generate
- top_p: float
+ top_p: Union[float, NoneType]
An alternative to sampling with temperature, called nucleus sampling,
where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10%
probability mass are considered. Recommended to use top_p or
temperature but not both.
- frequency_penalty: float
+ frequency_penalty: Union[float, NoneType]
Number between -2.0 and 2.0. Positive values penalize new tokens based
on their existing frequency in the text so far, decreasing the model's
likelihood to repeat the same line verbatim.
- presence_penalty: float
+ presence_penalty: Union[float, NoneType]
Number between -2.0 and 2.0. Positive values penalize new tokens based
on whether they appear in the text so far, increasing the model's
likelihood to talk about new topics.
- stop: str
+ stop: Union[str, NoneType]
A string where the API will stop generating further tokens.
logit_bias: Union[dict, str, NoneType]
Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
- temperature: float
- max_tokens: int
- top_p: float
- frequency_penalty: float
- presence_penalty: float
- stop: str
+ temperature: Union[float, NoneType]
+ max_tokens: Union[int, NoneType]
+ top_p: Union[float, NoneType]
+ frequency_penalty: Union[float, NoneType]
+ presence_penalty: Union[float, NoneType]
+ stop: Union[str, NoneType]
logit_bias: Union[dict, str, NoneType]
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
- temperature: float
- max_tokens: int
- top_p: float
- frequency_penalty: float
- presence_penalty: float
- stop: str
+ temperature: Union[float, NoneType]
+ max_tokens: Union[int, NoneType]
+ top_p: Union[float, NoneType]
+ frequency_penalty: Union[float, NoneType]
+ presence_penalty: Union[float, NoneType]
+ stop: Union[str, NoneType]
logit_bias: Union[dict, str, NoneType]
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
- temperature: float
- max_tokens: int
- top_p: float
- frequency_penalty: float
- presence_penalty: float
- stop: str
+ temperature: Union[float, NoneType]
+ max_tokens: Union[int, NoneType]
+ top_p: Union[float, NoneType]
+ frequency_penalty: Union[float, NoneType]
+ presence_penalty: Union[float, NoneType]
+ stop: Union[str, NoneType]
logit_bias: Union[dict, str, NoneType]
So clearly on Python 3.8 this bit of code has a different output: https://github.com/simonw/llm/blob/18f34b5df25f20afceaa6a85dbd55d2b6dd47cb3/llm/cli.py#L382-L384
To test locally I ran:
pyenv install 3.8.17
Then waited for that to compile.
Then:
~/.pyenv/versions/3.8.17/bin/python -m venv /tmp/pvenv
source /tmp/pvenv/bin/activate
pip install -e '.[test]'
/tmp/pvenv/bin/cog --check docs/usage.md
And to rewrite it:
/tmp/pvenv/bin/cog -r docs/usage.md
Extracted a TIL: https://til.simonwillison.net/python/quick-testing-pyenv
Now documented here: https://llm.datasette.io/en/latest/usage.html#listing-available-models - including a cog
powered example.
This might be part of
llm models list
or may be something else.Follows:
53