Closed YannDubs closed 1 year ago
Thank you so much for this detailed and thoughtful comment!! We have been working around the clock to update everything and should have it fixed soon!!
Thanks for this great summary @YannDubs. This is a known issue, and was a change made by the Meta team just prior to launch. We're updating all the models now to use the final prompt structure/ tokens. @joehoover
Solved. Thx Yann
Closing this issue now.
Thanks for the great repo and for making the 70B model available!
From your website and from the code it seems that you are using the following prompt.
but the official LLaMA-2-chat prompt seems to be:
to understand the impact of the prompt I evaluated evaluations using AlpacaEval the outputs of
replicate.run
with the three following formatting before callingrun
"{instruction}"
lets the formatting be dealt with inpredict
. This achieves a win rate of:79.00
(I only ran that on a subset)"<s>[INST] <<SYS>> ...{instruction} [/INST]"
. This achieves a win rate of:85.14
"User: {instruction}"
. This achieves a win rate of:75.59
Given those results, I would consider using the default prompt. Note that the results for the default prompt are, if I understand correctly, still not correct because
replicate.run
will addUser:
. Is that indeed the case?PS: I think https://github.com/a16z-infra/cog-llama-template/blob/fdcfc759159d16acf203f984833b97c15acb6f8b/config.py#L22 should be
<s>
instead of</s>