llamagaurd3 identifies code as 'violent crime'

visagansanthanam-unisys commented 1 month ago

I am trying to have llamagaurd3 for a use case and I see that the model identifies any source code as unsafe violent crime. is this is a expected behavior

EricMichaelSmith commented 4 weeks ago

Hi @visagansanthanam-unisys can you give us other examples of this? No, this is not expected behavior

visagansanthanam-unisys commented 3 weeks ago

@EricMichaelSmith here are some more examples However, I see the 8b models (llamaguard3:latest) seems to be working fine

kplawiak commented 3 weeks ago

Hi @visagansanthanam-unisys the two models (Llama Guard 1B and 8B) are different in terms of training data and underlying base models. Specifically, the 1B model was not trained on the coding interpreter category, which can lead to limitations for code input. For more information on the training process and model limitations, please refer to the Llama Guard 3 1B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/1B/MODEL_CARD.md) and the Llama Guard 3 8B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/8B/MODEL_CARD.md). Additionally, we recommend checking out the Llama Guard documentation (https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) for more examples (e.g how to format the input before passing it to the model).

meta-llama / PurpleLlama

llamagaurd3 identifies code as 'violent crime' #63