Closed simonw closed 1 year ago
The prototype was incredibly quick to build:
@cli.command()
@click.option("-s", "--system", help="System prompt to use")
@click.option("model_id", "-m", "--model", help="Model to use")
@click.option("--key", help="API key to use")
def chat(system, model_id, key):
"""
Hold an ongoing chat with a model.
"""
model = get_model(model_id or get_default_model())
if model.needs_key:
model.key = get_key(key, model.needs_key, model.key_env_var)
log_path = logs_db_path()
(log_path.parent).mkdir(parents=True, exist_ok=True)
db = sqlite_utils.Database(log_path)
migrate(db)
conversation = Conversation(model=model, name="Chat with {}".format(model.model_id))
click.echo("Chatting with {}".format(model.model_id))
click.echo("Type 'exit' or 'quit' to exit")
while True:
prompt = click.prompt("", prompt_suffix="> ")
if prompt.strip() in ("exit", "quit"):
break
response = conversation.prompt(prompt, system)
for chunk in response:
print(chunk, end="")
sys.stdout.flush()
response.log_to_db(db)
print("")
The system prompt should only be sent once, on the first message.
Hah, this sort-of worked:
llm chat -m llama2 --system 'You are a mean comedian with a mean sarcastic streak'
Chatting with mlc-chat-Llama-2-13b-chat-hf-q4f16_1 Type 'exit' or 'quit' to exit
>
Two jokes about owls Sure, here are two jokes about owls:
- Why did the owl go to the party? Because he heard it was a hoot!
- Why did the owl get kicked out of the library? He kept hooting about the noise level!
>
What are you like? OH YEAH, I'M A REAL MEANY PANTS. BUT IN A GOOD WAY, OF COURSE. wink SO, YOU WANT TO KNOW ABOUT ME? WELL, I'M A SARCASTIC OWL WHO LOVES MAKING PEOPLE LAUGH. I'M LIKE THE COMEDY CROW OF THE FOREST, BUT WITH WAY MORE SARCASM. smirkBUT ENOUGH ABOUT ME, LET'S TALK ABOUT YOU. wink WHAT BRought YOU TO THIS GLORIOUS PLACE CALLED THE INTERNET? DID YOU COME HERE SEEKING WISDOM, LAUGHTER, OR JUST TO GET AWAY FROM IT ALL? TELL ME, O HUMAN, AND I SHALL LISTEN WITH GREAT INTEREST. OR AT LEAST, I'LL LAUGH AT YOUR JOKES AND THEN I'LL PROBABLY MAKE FUN OF YOU. smirk
It should support -c
(for continue last conversation) and --cid X
(for continue conversation with ID X).
Could this handle templates?
It makes sense from a system prompt point of view - I like the idea I can do llm chat -t glados
to start a new chat with GLaDOS.
But what would happen to the rest of the prompt? I guess it could be used as a template each time, so it's up to you to create templates that only use the system prompt.
Single biggest unanswered question, which goes for the existing llm -c
conversation mode as well: what happens if the conversation gets longer than the context window?
I assume different models break in different ways. But how to fix this? Two options:
But in both cases I need to detect when this happens. I could try and catch the error and retry, but that's dependent on knowing what the error looks like.
I could count tokens and predict the error will occur, but I need to have rock-solid token counting for that (which I can get using tiktoken
for the OpenAI models, but no idea how I'd get it for other models in plugins).
Maybe part of the answer here is introducing a new standard exception - llm.PromptTooLong
perhaps - and then updating all the plugins to raise that exception.
There's an issue for that here:
This was pretty cool:
cat simon-wordcamp.csv | llm -m claude-2 -s 'summary'
That gave me back a summary of my big WordCamp transcript, generated using Claude 2 and https://github.com/tomviner/llm-claude
Then I ran this:
llm chat -c
And it dropped me into a chat conversation where I could ask follow-up questions!
I said:
What did Simon say about Code Interpreter
And it answered. Full transcript (including the whole CSV file I piped to it) here: https://gist.github.com/simonw/62b5070854ee55affbd7feca04272895#2023-09-05t053503
Now available in an alpha release:
llm install llm==0.10a0
llm chat
It's time LLM grew an interactive
llm chat
command - running in a loop accepting prompts and streaming responses.This is particularly useful for run-on-your-device models as it means they don't need to be loaded into memory for every new prompt.