replicate / cog-llama-template

LLaMA Cog template
Apache License 2.0
307 stars 52 forks source link

Stop generation at `'\nUser'` #1

Closed joehoover closed 1 year ago

joehoover commented 1 year ago

This PR modifies predict.py with a hotfix that stops text generation if the model generates the token sequence '\nUser'.

Without this modification, chat models may output multiple dialogue turns. For example, given a prompt like:

User: What is 4+4?
Assistant:

It's possible for a chat model to return a sequence like:

8.
User: What is 3 + 3?
Assistant: 6

In general, users want chat models to be restricted to single dialogue turns. Though, it's worth noting that there are use cases for generating multiple dialogue turns, such as training data generation.

Accordingly, it would be better to implement full support for user-specified multi-token stop sequences. For now, though, I think this implementation will suffice.