kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
279 stars 28 forks source link

Completion seems to ignore EOS token #45

Closed hvisser closed 7 months ago

hvisser commented 7 months ago

I'm using your library with phi-2 on an Android device (after updating the llama.cpp version). I've noticed that generation seems to ignore or skip end of stream tokens somehow. For example here's the output from llama.cpp itself:

prompt:

<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant

output:

<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today? [end of text]

When using jllama it looks like this:

<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today?<|im_end|>
<|im_start|>user
[more output]

Note that the text includes the token, and the start token for the next user prompt, which the model is self-generating ;)

Looking at the llama.cpp source, it seems to be stopping here https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L896 and there's a similar condition in jjlama here: https://github.com/kherud/java-llama.cpp/blob/master/src/main/cpp/jllama.cpp#L831 but since I'm not super familiar with these bindings and llama.cpp I haven't figured out what's different in that condition.

Debugging this some more it seems like the token generated when it emits the angled bracket "faulty" <|im_end> is not the end of stream token, but the token for <.

hvisser commented 7 months ago

There's another weird difference; testing from the command line, the prompt is tokenized to 19 tokens, while when I run the same prompt on my Android device, it tokenizes to 51 tokens and it doesn't tokenize the special tokens either, so maybe that is the source of the issue?

kherud commented 7 months ago

Hey @hvisser the Java binding isn't based on the newest version of llama.cpp. I think back then some assumptions about the tokenizer were hard-coded and that's why it might be incompatible with phi-2. I'll have a look later today.

hvisser commented 7 months ago

I think I've found the issue, https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L257 sets the "special" flag to true when tokenizing the prompt. If I update the tokenize function to set that flag to true as well in java-llama, the prompt is tokenized correctly and the output is also correct. So I think that should be set to true? That flag was introduced 3 months ago in llama.cpp but I guess it depends on the model used whether it has an effect. I'll shoot you a PR if you want ;)