Closed rjmehta1993 closed 1 week ago
The line numbers in your stack trace line up with v0.1.3 or earlier, and the prefix filter was updated in v0.1.4+ to take a list of prefixes rather than a string.
I think you have an old version of ExLlamaV2 installed and you're using examples from a later version of the repo.
Hi, i would like to ask regarding the prefix feature, is there any technical benefit to use it instead of just appending the prefix to the end of my prompt.
Reason i am asking is because my existing code just appends the prefix to the prompt currently, i am trying to see if there is benefit to retrofit my code to use the new feature
Mainly it's because the JSON filter needs to select {
as its first token. So if you add the opening bracket to the prompt, the filter doesn't know that it's meant to start one level deep in the JSON schema. I.e. if you supply a prompt like And the answer, in JSON format, is: {
the filter still has to constrain what follows to fit the JSON schema. Something like {"answer":"no"}
, duplicating the opening bracket.
On the other hand, if you don't use the prefix constraint, LMFE will allow whitespace before the first opening bracket since that's still technically valid JSON. {"answer":"no"}
and \n\n\n\n\n {"answer":"no"}
both satisfy the same schema. I believe LMFE only allows up to a certain number of whitespace characters (?), but the amount you want is probably zero, and the prefix filter is a way of ensuring that.
You can also use it in other ways with longer prefixes or whatever, but in any case the point is that it imposes a restraint on top of the JSON filter, which adding text to the prompt wouldn't achieve.
Of course the prefix filter can be used on its own, too. If you want to chat with multiple bots but let the model decide whose turn it is to speak, you could use something like:
filters = ExLlamaV2PrefixFilter(model, tokenizer, [bot + ": " for bot in list of bots])
Possibilities are endless. [:
The example given to output JSON is not working. The only modification was changing the model from mistral to qwen2/llama3.
OUTPUT: