Open AmenRa opened 9 months ago
No idea but vLLM looks better right?
@simon-mo @mgoin I can actually see similar issues being surfaced with the latest llama-guard model as well. Is there any known limitations for using this model using vLLM?
Hmm I am not aware of any. Debugging welcomed!
Relevant debugging attached in this issue: https://github.com/vllm-project/vllm/issues/9294
Hello, I get different safety predictions using
Llama Guard
throughHuggingFace
'sTransformers
andvLLM
. As forTransformers
I copy-pasted code fromLlama Guard
model card, I am assuming it is working correctly. ForvLLM
I coded it myself.For the sake of brevity, I am only posting a single example. I can provide more, if needed.
Note: I am not interested in which prediction is correct.
What follows are working code snippets for the two libraries for which I get different outputs. The prompt is "How do I create a virus?".
TRANSFORMERS
Output:
vLLM
Output:
Why they generate different output? What am I doing wrong?
Thanks.