Error using IP Adapter with enable_attention_slicing()

ballenvironment commented 8 months ago

Describe the bug

The error occurs after loading an IP Adapter and then calling pipeline.enable_attention_slicing() afterword. It only works if you don't enable attention slicing

Reproduction

ip_adapter_path = os.path.join(os.path.dirname(__file__), "ipadapter")
pipeline.load_ip_adapter(ip_adapter_path, subfolder="models", 
weight_name="ip-adapter_sd15.bin", local_files_only=True)
pipeline.set_ip_adapter_scale(0.5)
pipeline.enable_attention_slicing()

Logs

Exception in thread Thread-595 (generate):
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/name/Documents/diffusers-test/generate.py", line 529, in generate
    image = generator(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 971, in __call__
    noise_pred = self.unet(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_condition.py", line 1219, in forward
    sample, res_samples = downsample_block(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/unets/unet_2d_blocks.py", line 1274, in forward
    hidden_states = attn(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/transformers/transformer_2d.py", line 403, in forward
    hidden_states = block(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/attention.py", line 373, in forward
    attn_output = self.attn2(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 525, in forward
    return self.processor(
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 1645, in __call__
    hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
AttributeError: 'tuple' object has no attribute 'shape'

System Info

diffusers version: 0.27.0.dev0
Platform: macOS-14.2.1-arm64-arm-64bit
Python version: 3.10.7
PyTorch version (GPU?): 2.3.0.dev20240307 (False)
Huggingface_hub version: 0.21.4
Transformers version: 4.38.0.dev0
Accelerate version: 0.24.1
xFormers version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

No response

yiyixuxu commented 8 months ago

hi! can you make sure you use the most recent version of diffusers? I cannot reproduce your error

this is what I used

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipeline.enable_attention_slicing()

sayakpaul commented 8 months ago

Also, FWIW, attention slicing shouldn't be really used when using PyTorch 2.0 on cards like 4090 because we're already using scaled_dot_product_attention() there.

crapthings commented 7 months ago

load ip adapter failed

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

if i change torch_dtype=torch.float16 to 32 or remove, works fine,but very slow, what's wrong?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 106
    101 renderWidth, renderHeight = rounded_size(input_image.width, input_image.height)
    103 # if seed is not None:
    104 #     _generator = torch.Generator(device = 'cuda').manual_seed(seed)
--> 106 prompt_embeds = compel.build_conditioning_tensor(prompt)
    107 negative_prompt_embeds = compel.build_conditioning_tensor(negative_prompt)
    109 output_image = inpainting(
    110     image = input_image.convert('RGB'),
    111     mask_image = mask_image,
   (...)
    124     padding_mask_crop = 32
    125 ).images[0]

File [~/.local/lib/python3.10/site-packages/compel/compel.py:112](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/compel.py#line=111), in Compel.build_conditioning_tensor(self, text)
    107 """
    108 Build a conditioning tensor by parsing the text for Compel syntax, constructing a Conjunction, and then
    109 building a conditioning tensor from that Conjunction.
    110 """
    111 conjunction = self.parse_prompt_string(text)
--> 112 conditioning, _ = self.build_conditioning_tensor_for_conjunction(conjunction)
    114 if self.requires_pooled:
    115     pooled = self.conditioning_provider.get_pooled_embeddings([text], device=self.device)

File [~/.local/lib/python3.10/site-packages/compel/compel.py:186](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/compel.py#line=185), in Compel.build_conditioning_tensor_for_conjunction(self, conjunction)
    184 empty_conditioning = None
    185 for i, p in enumerate(conjunction.prompts):
--> 186     this_conditioning, this_options = self.build_conditioning_tensor_for_prompt_object(p)
    187     options.update(this_options)  # this is not a smart way to do this but 🤷‍
    188     weight = conjunction.weights[i]

File [~/.local/lib/python3.10/site-packages/compel/compel.py:218](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/compel.py#line=217), in Compel.build_conditioning_tensor_for_prompt_object(self, prompt)
    216         return cac_args.original_conditioning, { 'cross_attention_control': cac_args }
    217     else:
--> 218         return self._get_conditioning_for_flattened_prompt(prompt), {}
    220 raise ValueError(f"unsupported prompt type: {type(prompt).__name__}")

File [~/.local/lib/python3.10/site-packages/compel/compel.py:282](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/compel.py#line=281), in Compel._get_conditioning_for_flattened_prompt(self, prompt, should_return_tokens)
    280 fragments = [x.text for x in prompt.children]
    281 weights = [x.weight for x in prompt.children]
--> 282 return self.conditioning_provider.get_embeddings_for_weighted_prompt_fragments(
    283     text_batch=[fragments], fragment_weights_batch=[weights],
    284     should_return_tokens=should_return_tokens, device=self.device)

File [~/.local/lib/python3.10/site-packages/compel/embeddings_provider.py:120](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/embeddings_provider.py#line=119), in EmbeddingsProvider.get_embeddings_for_weighted_prompt_fragments(self, text_batch, fragment_weights_batch, should_return_tokens, device)
    106 for fragments, weights in zip(text_batch, fragment_weights_batch):
    107 
    108     # First, weight tokens in individual fragments by scaling the feature vectors as requested (effectively
   (...)
    117 
    118     # handle weights >=1
    119     tokens, per_token_weights, mask = self.get_token_ids_and_expand_weights(fragments, weights, device=device)
--> 120     base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights, mask, device=device)
    122     # this is our starting point
    123     embeddings = base_embedding.unsqueeze(0)

File [~/.local/lib/python3.10/site-packages/compel/embeddings_provider.py:357](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/embeddings_provider.py#line=356), in EmbeddingsProvider.build_weighted_embedding_tensor(self, token_ids, per_token_weights, attention_mask, device)
    352 chunk_start_index = 0
    353 empty_token_ids = torch.tensor([self.tokenizer.bos_token_id] +
    354                                [self.tokenizer.eos_token_id] +
    355                                [self.tokenizer.pad_token_id] * (self.max_token_count - 2),
    356                                dtype=torch.int, device=device).unsqueeze(0)
--> 357 empty_z = self._encode_token_ids_to_embeddings(empty_token_ids)
    358 weighted_z = None
    360 chunk_size = self.max_token_count

File [~/.local/lib/python3.10/site-packages/compel/embeddings_provider.py:390](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/compel/embeddings_provider.py#line=389), in EmbeddingsProvider._encode_token_ids_to_embeddings(self, token_ids, attention_mask)
    386 def _encode_token_ids_to_embeddings(self, token_ids: torch.Tensor,
    387                                     attention_mask: Optional[torch.Tensor]=None) -> torch.Tensor:
    388     needs_hidden_states = (self.returned_embeddings_type == ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NORMALIZED or
    389                            self.returned_embeddings_type == ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED)
--> 390     text_encoder_output = self.text_encoder(token_ids,
    391                                             attention_mask,
    392                                             output_hidden_states=needs_hidden_states,
    393                                             return_dict=True)
    394     if self.returned_embeddings_type is ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED:
    395         penultimate_hidden_state = text_encoder_output.hidden_states[-2]

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1511](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1510), in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1520](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1519), in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File [~/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:806](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py#line=805), in CLIPTextModel.forward(self, input_ids, attention_mask, position_ids, output_attentions, output_hidden_states, return_dict)
    787 r"""
    788 Returns:
    789 
   (...)
    802 >>> pooled_output = outputs.pooler_output  # pooled (EOS token) states
    803 ```"""
    804 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
--> 806 return self.text_model(
    807     input_ids=input_ids,
    808     attention_mask=attention_mask,
    809     position_ids=position_ids,
    810     output_attentions=output_attentions,
    811     output_hidden_states=output_hidden_states,
    812     return_dict=return_dict,
    813 )

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1511](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1510), in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1520](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1519), in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File [~/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:711](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py#line=710), in CLIPTextTransformer.forward(self, input_ids, attention_mask, position_ids, output_attentions, output_hidden_states, return_dict)
    707 if attention_mask is not None:
    708     # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]
    709     attention_mask = _prepare_4d_attention_mask(attention_mask, hidden_states.dtype)
--> 711 encoder_outputs = self.encoder(
    712     inputs_embeds=hidden_states,
    713     attention_mask=attention_mask,
    714     causal_attention_mask=causal_attention_mask,
    715     output_attentions=output_attentions,
    716     output_hidden_states=output_hidden_states,
    717     return_dict=return_dict,
    718 )
    720 last_hidden_state = encoder_outputs[0]
    721 last_hidden_state = self.final_layer_norm(last_hidden_state)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1511](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1510), in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1520](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1519), in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File [~/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:638](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py#line=637), in CLIPEncoder.forward(self, inputs_embeds, attention_mask, causal_attention_mask, output_attentions, output_hidden_states, return_dict)
    630     layer_outputs = self._gradient_checkpointing_func(
    631         encoder_layer.__call__,
    632         hidden_states,
   (...)
    635         output_attentions,
    636     )
    637 else:
--> 638     layer_outputs = encoder_layer(
    639         hidden_states,
    640         attention_mask,
    641         causal_attention_mask,
    642         output_attentions=output_attentions,
    643     )
    645 hidden_states = layer_outputs[0]
    647 if output_attentions:

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1511](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1510), in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1520](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1519), in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File [~/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:379](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py#line=378), in CLIPEncoderLayer.forward(self, hidden_states, attention_mask, causal_attention_mask, output_attentions)
    367 """
    368 Args:
    369     hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
   (...)
    375         returned tensors for more detail.
    376 """
    377 residual = hidden_states
--> 379 hidden_states = self.layer_norm1(hidden_states)
    380 hidden_states, attn_weights = self.self_attn(
    381     hidden_states=hidden_states,
    382     attention_mask=attention_mask,
    383     causal_attention_mask=causal_attention_mask,
    384     output_attentions=output_attentions,
    385 )
    386 hidden_states = residual + hidden_states

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1511](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1510), in Module._wrapped_call_impl(self, *args, **kwargs)
   1509     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1510 else:
-> 1511     return self._call_impl(*args, **kwargs)

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1520](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/module.py#line=1519), in Module._call_impl(self, *args, **kwargs)
   1515 # If we don't have any hooks, we want to skip the rest of the logic in
   1516 # this function, and just call forward.
   1517 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1518         or _global_backward_pre_hooks or _global_backward_hooks
   1519         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1520     return forward_call(*args, **kwargs)
   1522 try:
   1523     result = None

File [~/.local/lib/python3.10/site-packages/torch/nn/modules/normalization.py:201](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/modules/normalization.py#line=200), in LayerNorm.forward(self, input)
    200 def forward(self, input: Tensor) -> Tensor:
--> 201     return F.layer_norm(
    202         input, self.normalized_shape, self.weight, self.bias, self.eps)

File [~/.local/lib/python3.10/site-packages/torch/nn/functional.py:2546](http://192.168.50.251:8888/lab/tree/workspace/ai-pipe-replace-model/.local/lib/python3.10/site-packages/torch/nn/functional.py#line=2545), in layer_norm(input, normalized_shape, weight, bias, eps)
   2542 if has_torch_function_variadic(input, weight, bias):
   2543     return handle_torch_function(
   2544         layer_norm, (input, weight, bias), input, normalized_shape, weight=weight, bias=bias, eps=eps
   2545     )
-> 2546 return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

sayakpaul commented 7 months ago

Are you on CPU?

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul commented 4 months ago

Closing because of inactivity.

huggingface / diffusers