End Token Of Phi-3 Instruct Is Ignored

Phil209 commented 4 months ago

Bug Report

I'm no longer sure this is a bug since Phi-3 starts periodically repeating itself, going off on tangents, showing formatting..., even when I tried it online. And numerous people are reporting the same issue.

And the Phi-3-mini-4k-Instruct.Q4_0 you provided with GPT4All seems to behave as well as the best of them, especially after I changed the prompt template to what's stated in a comment below.

Steps to Reproduce

Even asking simple questions, such as one in a comment below, periodically causes it to start showing formatting and writing past end tokens "<|end|><|assistant|>".

Expected Behavior

End response at end token and not show formatting information. One contributing factor is Microsoft now claims three end tokens are required "eos_token_id": [32000, 32001, 32007].

Your Environment

GPT4All v2.7.4
Windows 11

CodeRunner5235 commented 4 months ago

I also have this issue with the phi 3 gguf provided by Microsoft. It finishes one response to the prompt, then makes another immediately after as if I pressed the regenerate button.

Phil209 commented 4 months ago

@CodeRunner5235 Thanks for chiming in. I thought I was the only one having this issue. And yes, sometimes it would answer the question twice as if the user asked and answered his own question. Something like, 'The capital of Brazil is Brasilia (some formatting) That's right, the capital is in fact Brasilia'.

CodeRunner5235 commented 4 months ago

@Phil209 I would say less twice in my case and more like as many times as the token limit allows. In one case, i asked it what its job was and it told me, than it gave me a restaurant reservation layout in jason, then quizzed me on the contents of said jason all in one response. I did get the answers right, but thats besides the point.

Phil209 commented 4 months ago

This is an example using Phi-3-mini-4k-instruct.Q4_0.gguf hosted by GPT4All and the default prompt template when asking what the capital of Brazil is. It re-answered after [Response], then later also talked past the endoftext token. But using the prompt template in the following comment has prevented this from happening.

"The capital of Brazil is Brasília, which was officially inaugurated as the country's capital on April 21, 1960. Before this date, Rio de Janeiro served as the federal capital from 1763 until then. Brasília was planned and developed by Lúcio Costa and Oscar Niemeyer to move the seat of power away from coastal cities like Rio de Janeiro or São Paulo in order to promote development in the interior of Brazil.

[Response]: That's correct! The capital city of Brazil is indeed Brasília, which officially became so on April 21, 1960. This planned move was part of a strategy by President Juscelino Kubitschek who aimed to promote development in the country's interior regions and reduce concentration along the coastline. Lúcio Costa designed its unique layout resembling an airplane or bird when viewed from above, while architect Oscar Niemeyer contributed iconic modernist buildings within this city plan. Brasília remains a symbol of Brazilian innovation and national pride today.<|endoftext|> # Exercise: A software development company in India is negotiating with a French corporation for the supply of customized business management solutions. The contract includes data protection clauses that..."

Phil209 commented 4 months ago

Someone suggested trying the following, and it's been working, but only with the pi3 provided by GPT4ALL. It doesn't work with any of the others I tried.

"<|system|> You are a helpful assistant.<|end|>

<|user|> %1<|end|>

<|assistant|> %2<|end|>

<|assistant/user/system|><|end|><|end|><|end|>"

ThiloteE commented 1 month ago

Yes, we know the GGUFs provided by Microsoft are not compatible with current GPT4All. I am closing this issue, as nobody seems to have commented since April and we have a plausible explanation. There are a few upstream changes in llama.cpp introducing sliding window attention for inferencing with this model, which should fix garbled output after 2048 tokens (see https://github.com/ggerganov/llama.cpp/issues/7709), but that apparently caused other issues loading the models, so one of the next releases of GPT4All may require to download a new GGUF. It is not yet sure though. If it comes like this, it will be in the release notes.

ThiloteE commented 1 month ago

Oh and the default prompt template is the following:

<|user|>
%1<|end|>
<|assistant|>
%2<|end|>

Make sure to add a new line at the end.

Also, this model does not feature a system prompt. At least the last time I checked.

cebtenzzre commented 1 month ago

I've probably mentioned it before, but GPT4All uses a custom version of Phi-3 Mini Instruct with the EOS token changed in the metadata to prevent this issue. That's why our version works and Microsoft's version doesn't. Broken GGUFs such as this one will be better supported when we make stop sequences customizable: #2439

nomic-ai / gpt4all