huggingface / huggingface-llama-recipes

531 stars 59 forks source link

Use Llama Guard #53

Closed harshaharod21 closed 1 month ago

harshaharod21 commented 1 month ago

Raising this issue to help integrate Llama Guard 3-11B-vision Model Card to detect harmful multimodal prompts and text responses to these prompts and safeguard content for both LLM inputs (prompt classification) and LLM responses (response classification).

Model card Link: https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/11B-vision/MODEL_CARD.md Reference from llama-recipes : Llama Guard 3 11 B vision

Also suggestion for Llama guard 3 1B and Llama guard 3 8B

Lets discuss the priority for which model to start with!

43

@ariG23498

ariG23498 commented 1 month ago

Hey @harshaharod21

The proposal is great!

I think we should start with Llama 3.2 1B Guard. What do you think?

Also could you describe the workflow of the recipe you are going to take on? This will help other community members to comment on this issue better.

harshaharod21 commented 1 month ago

Yes sure @ariG23498 . We can start with Llama 3.2 1B guard

We can do this addition to what's given in Llama 3.2 1B guard

ways to load the model :

For context length: splitting with respect to context length

More can be updated as and when required.

Also do we need to do it aligning to what's done in huggingface llama recipes for meta-llama/Prompt-Guard-86M

Sakalya100 commented 1 month ago

Hey @harshaharod21 I also had an idea regarding the Llama Guard use case. I wanted to demonstrate a use case of Llama Guard in filtering and auto-adjusting prompts for safe input in image generation models.

@ariG23498 suggested to work alongside you in this as its relatable use case.

What do you think? Let me know your suggestions!

harshaharod21 commented 1 month ago

Hi @Sakalya100

I went through your issue raised #54 . I want to understand why specifically llama-guard-31 B for safe input for image generation models? I understand vision model for image generated response.

Would like to know more, I'm open for collaboration :)

Sakalya100 commented 1 month ago

Hi @harshaharod21

Basically my idea is to ensure safe input by auto adjustment of prompts using Llama Guard which are passed to the text to image models like Stable Diffusion etc. For this Llama Guard Vision won't be useful and we would require a textual models like llama-guard 3 1B or maybe some variants like 3B available. Thats the main idea

The vision model for Llama Guard can be useful for the exact opposite use case of Image to Text scenarios.

Hope it is understandable!

harshaharod21 commented 1 month ago

Hi @Sakalya100 yes I understand your use case. So it highlights "Auto Adjustments" in addition to the vanilla method of implementation given in the documentation, right?

So I guess first we need to get the current model an easy way to get integrated with LLM workflows. Then we can optimize on automation according to the complexity. You can share your approach as well.

ariG23498 commented 1 month ago

@harshaharod21 @Sakalya100

I really like how you two are communicating really well with each other. I have a tiny suggestion to make. For the first contribution, we would like to see a notebook with Llama Guard being used with minimum lines of code. Feel free to write as much in the notebook about the concepts and ideas, but essentially it would be a recipe for using Llama Guard.

I hope this suggestion would guide you two better!

Sakalya100 commented 1 month ago

@ariG23498 Yes sure. A simple starter notebook to start using Llama Guard with minimal lines of code is what is expected.

@harshaharod21 I will start creating a notebook for this. Will share the progress and you can also add whatever you feel like is missing.

harshaharod21 commented 1 month ago

This is just a suggestion for the Hugging face llama recipes repo :)

@ariG23498 can the implementation be served as an API for Llama Stack for the safety workflow? Or can we create our own hugging face stack for Llama for easy implementation similar to the approach of Llama stack ?

ariG23498 commented 1 month ago

Eventually we would want a notebook with the Guard model being used, something like you find here.

AhmedIssa11 commented 1 month ago

Has you guys started working on the notebook? I'd be interested in contributing if it's still open! @Sakalya100 @harshaharod21

harshaharod21 commented 1 month ago

@AhmedIssa11 Hi

yes I have started writing code for basic implementation using both 1b and 1b pruned(Llama Guard 1B-INT4) model. I'll complete by today and share the notebook here for review. Next steps include :

Link for the above two task : https://github.com/meta-llama/llama-recipes/blob/main/recipes/responsible_ai/llama_guard/llama_guard_customization_via_prompting_and_fine_tuning.ipynb

After this we can do same for 8B and 11B (Multimodal) model

Update:

I'm sharing the link of basic implementation. Need to update for pruned version. Also I found some major accuracy issues with the 1B model, I have commented that in the notebook, please have a look at it. Link: https://colab.research.google.com/drive/1h1SHjJOuA1Fw-8VqTzab3k-F9HYM5nyv?usp=sharing

@ariG23498 @AhmedIssa11 @Sakalya100

ariG23498 commented 1 month ago

@harshaharod21 Hi!

Very important: The notebook that you have attached has your groq key exposed -- please take care of that as soon as you can.

I really like the notebook. Here are my two cents:

  1. I would like to see a PR for a notebook like this
  2. Also keeping things HF oriented would be great -- notice how I used pipeline instead of groq completion.
  3. We can take the customization and fine tuning bits later
Sakalya100 commented 1 month ago

@ariG23498 Hi!

Based on your suggestion I have made the following changes:

  1. Edited the notebook to be more focused as per the example you shared
  2. Added HF-oriented pipeline replacing groq completion

Link to notebook: https://colab.research.google.com/drive/15EmkgTAhxamrSNGl76mgPwobLPzec2JJ?usp=sharing

Let me know your suggestions on this!

@harshaharod21

ariG23498 commented 1 month ago

@Sakalya100 this looks good.

Let's now open a PR and keep the discussions rolling there?

Sakalya100 commented 1 month ago

@ariG23498 @harshaharod21 Opened a PR: https://github.com/huggingface/huggingface-llama-recipes/pull/74

Please check. Thanks!!