Closed hvisser closed 7 months ago
There's another weird difference; testing from the command line, the prompt is tokenized to 19 tokens, while when I run the same prompt on my Android device, it tokenizes to 51 tokens and it doesn't tokenize the special tokens either, so maybe that is the source of the issue?
Hey @hvisser the Java binding isn't based on the newest version of llama.cpp. I think back then some assumptions about the tokenizer were hard-coded and that's why it might be incompatible with phi-2. I'll have a look later today.
I think I've found the issue, https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L257 sets the "special" flag to true
when tokenizing the prompt. If I update the tokenize
function to set that flag to true
as well in java-llama
, the prompt is tokenized correctly and the output is also correct. So I think that should be set to true? That flag was introduced 3 months ago in llama.cpp
but I guess it depends on the model used whether it has an effect. I'll shoot you a PR if you want ;)
I'm using your library with phi-2 on an Android device (after updating the llama.cpp version). I've noticed that generation seems to ignore or skip end of stream tokens somehow. For example here's the output from
llama.cpp
itself:prompt:
output:
When using jllama it looks like this:
Note that the text includes the token, and the start token for the next user prompt, which the model is self-generating ;)
Looking at the llama.cpp source, it seems to be stopping here https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L896 and there's a similar condition in jjlama here: https://github.com/kherud/java-llama.cpp/blob/master/src/main/cpp/jllama.cpp#L831 but since I'm not super familiar with these bindings and llama.cpp I haven't figured out what's different in that condition.
Debugging this some more it seems like the token generated when it emits the angled bracket "faulty"
<|im_end>
is not the end of stream token, but the token for<
.