Perhaps at the moment, the model is simply not entirely compatible with the tiling in AutoEncoderKL, as the state dict does not possess the keys post_quant_conv.bias, quant_conv.weight, post_quant_conv.weight, quant_conv.bias
REDACT\venv\Lib\site-packages\diffusers\models\attention_processor.py:1584: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
hidden_states = F.scaled_dot_product_attention(
Traceback (most recent call last):
File "REDACT\test.py", line 35, in <module>
pipe(**args)
File "REDACT\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "REDACT\venv\Lib\site-packages\diffusers\pipelines\controlnet_sd3\pipeline_stable_diffusion_3_controlnet.py", line 912, in __call__
control_image = self.vae.encode(control_image).latent_dist.sample()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACT\venv\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACT\venv\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 258, in encode
return self.tiled_encode(x, return_dict=return_dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "REDACT\venv\Lib\site-packages\diffusers\models\autoencoders\autoencoder_kl.py", line 363, in tiled_encode
tile = self.quant_conv(tile)
^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not callable
Describe the bug
VAE tiling works for SD3 with power of 2 images, but for no other alignments.
The mentioned issues with VAE tiling are due to: vae/config.json
Having:
Which causes the method used here:
https://github.com/huggingface/diffusers/blob/589931ca791deb8f896ee291ee481070755faa26/src/diffusers/models/autoencoders/autoencoder_kl.py#L363
And Here:
https://github.com/huggingface/diffusers/blob/589931ca791deb8f896ee291ee481070755faa26/src/diffusers/models/autoencoders/autoencoder_kl.py#L412
To be
None
Perhaps at the moment, the model is simply not entirely compatible with the tiling in
AutoEncoderKL
, as the state dict does not possess the keyspost_quant_conv.bias, quant_conv.weight, post_quant_conv.weight, quant_conv.bias
Is this intended?
Reproduction
Logs
System Info
Windows
diffusers 0.29.2
Who can help?
@yiyixuxu @sayakpaul @DN6 @asomoza