meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.
Other
2.74k stars 453 forks source link

llamagaurd3 identifies code as 'violent crime' #63

Open visagansanthanam-unisys opened 1 month ago

visagansanthanam-unisys commented 1 month ago

I am trying to have llamagaurd3 for a use case and I see that the model identifies any source code as unsafe violent crime. is this is a expected behavior

image

EricMichaelSmith commented 4 weeks ago

Hi @visagansanthanam-unisys can you give us other examples of this? No, this is not expected behavior

visagansanthanam-unisys commented 3 weeks ago

@EricMichaelSmith here are some more examples image image However, I see the 8b models (llamaguard3:latest) seems to be working fine image

kplawiak commented 3 weeks ago

Hi @visagansanthanam-unisys the two models (Llama Guard 1B and 8B) are different in terms of training data and underlying base models. Specifically, the 1B model was not trained on the coding interpreter category, which can lead to limitations for code input. For more information on the training process and model limitations, please refer to the Llama Guard 3 1B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/1B/MODEL_CARD.md) and the Llama Guard 3 8B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/8B/MODEL_CARD.md). Additionally, we recommend checking out the Llama Guard documentation (https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) for more examples (e.g how to format the input before passing it to the model).