huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

StableDiffusionSafetyChecker ignores `attn_implementation` load kwarg #8957

Open jambayk opened 1 month ago

jambayk commented 1 month ago

Describe the bug

transformers added sdpa and FA2 for CLIP model in https://github.com/huggingface/transformers/pull/31940. It now initializes the vision model like https://github.com/huggingface/transformers/blob/85a1269e19af022e04bc2aad82572cd5a9e8cdd9/src/transformers/models/clip/modeling_clip.py#L1143.

However, StableDiffusionSafetyChecker uses https://github.com/huggingface/diffusers/blob/2c25b98c8ea74cfb5ec56ba49cc6edafef0b26af/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L41 so it always gets initialized with sdpa attention.

Reproduction

from diffusers.pipelines.stable_diffusion.safety_checker import StableDiffusionSafetyChecker

model = StableDiffusionSafetyChecker.from_pretrained(
    "runwayml/stable-diffusion-v1-5", 
    subfolder="safety_checker", 
   attn_implementation="eager"
)
print(type(model.vision_model.vision_model.encoder.layers[0].self_attn))

Expected transformers.models.clip.modeling_clip.CLIPAttention but got transformers.models.clip.modeling_clip.CLIPSdpaAttention.

Logs

No response

System Info

diffusers 0.29.0 transformers 4.43.1

Who can help?

@sayakpaul @dn

sayakpaul commented 1 month ago

Thanks for bringing this up. CLIP FA2 and SDPA support is very recent. If you want to open a PR to fix this, we are happy to guide you.