This PR modifies predict.py with a hotfix that stops text generation if the model generates the token sequence '\nUser'.
Without this modification, chat models may output multiple dialogue turns. For example, given a prompt like:
User: What is 4+4?
Assistant:
It's possible for a chat model to return a sequence like:
8.
User: What is 3 + 3?
Assistant: 6
In general, users want chat models to be restricted to single dialogue turns. Though, it's worth noting that there are use cases for generating multiple dialogue turns, such as training data generation.
Accordingly, it would be better to implement full support for user-specified multi-token stop sequences. For now, though, I think this implementation will suffice.
This PR modifies
predict.py
with a hotfix that stops text generation if the model generates the token sequence'\nUser'
.Without this modification, chat models may output multiple dialogue turns. For example, given a prompt like:
It's possible for a chat model to return a sequence like:
In general, users want chat models to be restricted to single dialogue turns. Though, it's worth noting that there are use cases for generating multiple dialogue turns, such as training data generation.
Accordingly, it would be better to implement full support for user-specified multi-token stop sequences. For now, though, I think this implementation will suffice.