Support for chat models and conversations

simonw commented 1 year ago

https://replicate.com/a16z-infra/llama13b-v2-chat is LLaMA v2.

llm replicate add a16z-infra/llama13b-v2-chat --alias llama2 
llm -m llama2 "say hello"

Outputs:

to your new favorite drink. The classic cocktail, reimagined. Introducing the Espresso Martini, a modern twist on a timeless favorite. Made with rich, smooth espresso, velvety vodka, and a hint of sweetness, this cocktail is sure to satisfy your caffeine cravings and your taste buds. At 4.5% ABV, our Espresso Martini is the perfect drink for any time of day. Whether you're looking for a morning pick-me-up or an after-dinner treat, this cocktail is sure to impress. So why wait? Order now and experience the perfect blend of coffee and cocktail.

The Replicate docs say you should structure the prompt like this to get proper chat behaviour:

User: Write a story in the style of James Joyce. The story should be about a trip to the Irish countryside in 2083, to see the beautiful scenery and robots. Assistant:

Need a way to do this with llm-replicate - similar to how llm-gpt4all does it: https://github.com/simonw/llm-gpt4all/blob/01d8ccf0dadeb934fbee9f3d647d4bcd8bb0ad1f/llm_gpt4all.py#L85-L91

simonw commented 1 year ago

I'm going to implement this as a flag you pass to llm replicate add:

llm replicate add a16z-infra/llama13b-v2-chat --alias llama2 --chat

The --chat option will be recorded in models.json and will cause it to use the User: ...\nAssistant: prompt format.

I'll add a way to set a custom prompt format too, but not for the first release of this.

simonw commented 1 year ago

I can do a variant on this test:

https://github.com/simonw/llm-replicate/blob/b0311696aee53c69a35b736f3024d7816bd446e3/tests/test_replicate.py#L12-L28

Using this trick:

(Pdb) mock_client.run.call_args_list
[call('replicate/flan-t5-xl:7a216605843d87f5426a10d2cc6940485a232336ed04d655ef86b91e020e9210', input={'prompt': 'say hi'})]
(Pdb) mock_client.run.call_args_list[0].args, mock_client.run.call_args_list[0].kwargs
(('replicate/flan-t5-xl:7a216605843d87f5426a10d2cc6940485a232336ed04d655ef86b91e020e9210',), {'input': {'prompt': 'say hi'}})

simonw / llm-replicate

Support for chat models and conversations #8