simonw / llm

Access large language models from the command-line
https://llm.datasette.io
Apache License 2.0
4.49k stars 247 forks source link

llm chat command #231

Closed simonw closed 1 year ago

simonw commented 1 year ago

It's time LLM grew an interactive llm chat command - running in a loop accepting prompts and streaming responses.

This is particularly useful for run-on-your-device models as it means they don't need to be loaded into memory for every new prompt.

simonw commented 1 year ago

The prototype was incredibly quick to build:

@cli.command()
@click.option("-s", "--system", help="System prompt to use")
@click.option("model_id", "-m", "--model", help="Model to use")
@click.option("--key", help="API key to use")
def chat(system, model_id, key):
    """
    Hold an ongoing chat with a model.
    """
    model = get_model(model_id or get_default_model())
    if model.needs_key:
        model.key = get_key(key, model.needs_key, model.key_env_var)

    log_path = logs_db_path()
    (log_path.parent).mkdir(parents=True, exist_ok=True)
    db = sqlite_utils.Database(log_path)
    migrate(db)

    conversation = Conversation(model=model, name="Chat with {}".format(model.model_id))

    click.echo("Chatting with {}".format(model.model_id))
    click.echo("Type 'exit' or 'quit' to exit")
    while True:
        prompt = click.prompt("", prompt_suffix="> ")
        if prompt.strip() in ("exit", "quit"):
            break
        response = conversation.prompt(prompt, system)
        for chunk in response:
            print(chunk, end="")
            sys.stdout.flush()
        response.log_to_db(db)
        print("")
simonw commented 1 year ago

The system prompt should only be sent once, on the first message.

simonw commented 1 year ago

Hah, this sort-of worked:

llm chat -m llama2 --system 'You are a mean comedian with a mean sarcastic streak'

Chatting with mlc-chat-Llama-2-13b-chat-hf-q4f16_1 Type 'exit' or 'quit' to exit > Two jokes about owls Sure, here are two jokes about owls:

  1. Why did the owl go to the party? Because he heard it was a hoot!
  2. Why did the owl get kicked out of the library? He kept hooting about the noise level!

> What are you like? OH YEAH, I'M A REAL MEANY PANTS. BUT IN A GOOD WAY, OF COURSE. wink SO, YOU WANT TO KNOW ABOUT ME? WELL, I'M A SARCASTIC OWL WHO LOVES MAKING PEOPLE LAUGH. I'M LIKE THE COMEDY CROW OF THE FOREST, BUT WITH WAY MORE SARCASM. smirk

BUT ENOUGH ABOUT ME, LET'S TALK ABOUT YOU. wink WHAT BRought YOU TO THIS GLORIOUS PLACE CALLED THE INTERNET? DID YOU COME HERE SEEKING WISDOM, LAUGHTER, OR JUST TO GET AWAY FROM IT ALL? TELL ME, O HUMAN, AND I SHALL LISTEN WITH GREAT INTEREST. OR AT LEAST, I'LL LAUGH AT YOUR JOKES AND THEN I'LL PROBABLY MAKE FUN OF YOU. smirk

simonw commented 1 year ago

It should support -c (for continue last conversation) and --cid X (for continue conversation with ID X).

simonw commented 1 year ago

Could this handle templates?

It makes sense from a system prompt point of view - I like the idea I can do llm chat -t glados to start a new chat with GLaDOS.

But what would happen to the rest of the prompt? I guess it could be used as a template each time, so it's up to you to create templates that only use the system prompt.

simonw commented 1 year ago

Single biggest unanswered question, which goes for the existing llm -c conversation mode as well: what happens if the conversation gets longer than the context window?

I assume different models break in different ways. But how to fix this? Two options:

  1. Prevent the conversation from continuing past that point
  2. Truncate the conversation's start (though keep injecting the system prompt) to fit

But in both cases I need to detect when this happens. I could try and catch the error and retry, but that's dependent on knowing what the error looks like.

I could count tokens and predict the error will occur, but I need to have rock-solid token counting for that (which I can get using tiktoken for the OpenAI models, but no idea how I'd get it for other models in plugins).

Maybe part of the answer here is introducing a new standard exception - llm.PromptTooLong perhaps - and then updating all the plugins to raise that exception.

simonw commented 1 year ago

There's an issue for that here:

simonw commented 1 year ago

This was pretty cool:

cat simon-wordcamp.csv | llm -m claude-2 -s 'summary'

That gave me back a summary of my big WordCamp transcript, generated using Claude 2 and https://github.com/tomviner/llm-claude

Then I ran this:

llm chat -c

And it dropped me into a chat conversation where I could ask follow-up questions!

I said:

What did Simon say about Code Interpreter

And it answered. Full transcript (including the whole CSV file I piped to it) here: https://gist.github.com/simonw/62b5070854ee55affbd7feca04272895#2023-09-05t053503

simonw commented 1 year ago

Documentation preview: https://github.com/simonw/llm/blob/af82a18a126aa484e2ecbb01134b212e8786554c/docs/usage.md#starting-an-interactive-chat

simonw commented 1 year ago

Now available in an alpha release:

llm install llm==0.10a0
llm chat