Closed flatsiedatsie closed 6 months ago
Could you confirm if you're using <|end|>
after each turn, in the prompt?
If so, another possible cause is having a too-high penalty_repeat
configuration, which would prevent the LLM to repeat the <|end|>
token.
But an alternative is to use the new abortSignal to force a stop when it finds a specific text. For example:
onNewToken: (_token, _piece, currentText, { abortSignal }) => {
if (currentText.includes("<|end|>") || currentText.includes("<|user|>") || currentText.includes("<|assistant|>")) {
updateResponse(currentText.replace("<|end|>", "").replace("<|user|>", "").replace("<|assistant|>", ""));
abortSignal();
} else {
updateResponse(currentText);
}
}
yes, using the abort signal is exactly what I have done. Something I noticed there (but not sure yet), is: the abort signal doesnt toggle the signalling variable back to off?
I'm using Transformers.js to generate the prompt strings. This is what it generates:
<s><|user|>
What's the difference between red and green apples?<|end|>
<|assistant|>
and this is what it looks like if I ask it again (after interrupting it too):
<s><|user|>
What's the difference between red and green apples?<|end|>
<|assistant|>
The primary difference between red and green apples lies in their color, taste, and variety.
1. Color: Red apples are typically red in color, while green apples have a greenish hue.
2. Taste: The taste of<|end|>
<|user|>
What's the difference between red and green apples?<|end|>
<|assistant|>
another possible cause is having a too-high penalty_repeat configuration
I haven't set that variable, so I assume it's using a default value for that.
I think the problem maybe that phi-3 (and some new models) does not use EOS token to stop generation. They use EOG (end of generation) instead. Support on llama.cpp added that a while ago, so binding should also be added to wllama in the next version.
awesome.
If I see a stop token in the text output of the model, does that mean that it 'slipped past' Llama.cpp somehow? Does that mean I have a setting wrong somewhere? Or is this normal/common, and should I just keep an eye out for such tokens regardless, and try to interrupt the generation proces when it happens?
Example:
"What is the difference between red and green apples?" on Phi3 mini 128K:
.. And there I interrupted it.