Allow replicate model to be used in non-chat mode

llm -m replicate-a16z-infra-llama13b-v2-chat "ten names for a pet pelican"

generates a reasonable response, but if I hop over to my replicate dashboard I see that the actual prompt that was issued is:

User: ten names for a pet pelican
Assistant:

I believe this happens when chat mode is set.

I am finding when I play with llama2 via the replicate dashboard, this form of prompt doesn't work well for my use case (json extraction using examples derived via RAG) - especially for 13b. 70b is a bit better, but the trailing Assistant: still seems to confound it.

I am not sure if this is an issue with this plugin, or with how it's instantiated in the host llm package, but either way I can't figure out how to pass my prompt directly

simonw / llm-replicate

Allow replicate model to be used in non-chat mode #16