huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.41k stars 5.43k forks source link

SD3 latents error 'tuple' object has no attribute 'to' issue on stable_diffusion_3/pipeline_stable_diffusion_3.py #8545

Open s9anus98a opened 5 months ago

s9anus98a commented 5 months ago

script to produce:

image = pipe(prompt="", negative_prompt="", guidance_scale=1.,
                     num_inference_steps=20, latents=inv_latents)

inv_latents value print:


(tensor([[[[-0.8428, -0.8203, -0.8325,  ..., -0.8325, -0.8706, -0.8521],
           [-0.8511, -0.7632, -0.8467,  ..., -0.8008, -0.8745, -0.7905],
           [-0.8369, -0.8174, -0.8281,  ..., -0.8462, -0.8765, -0.8521],
           ...,
           [-0.8110, -0.7686, -0.8145,  ..., -0.7793, -0.8638, -0.7783],
           [-0.8770, -0.8442, -0.8525,  ..., -0.8672, -0.8745, -0.8862],
           [-0.8252, -0.7676, -0.8247,  ..., -0.7769, -0.8599, -0.7729]],

          [[ 0.2629,  0.2347,  0.2681,  ...,  0.2496,  0.2795,  0.2344],
           [ 0.2366,  0.1709,  0.2303,  ...,  0.1785,  0.2410,  0.1726],
           [ 0.2622,  0.2383,  0.2644,  ...,  0.2512,  0.2844,  0.2454],
           ...,
           [ 0.2448,  0.1589,  0.2306,  ...,  0.1731,  0.2385,  0.1768],
           [ 0.2634,  0.2157,  0.2786,  ...,  0.2510,  0.2832,  0.2466],
           [ 0.2473,  0.1547,  0.2256,  ...,  0.1705,  0.2375,  0.1667]],

          [[-0.4492, -0.3923, -0.4451,  ..., -0.3835, -0.4512, -0.3704],
           [-0.5542, -0.5742, -0.5522,  ..., -0.5449, -0.5347, -0.5562],
           [-0.4590, -0.3870, -0.4697,  ..., -0.3794, -0.4258, -0.3694],
           ...,
           [-0.5405, -0.5596, -0.5337,  ..., -0.5293, -0.5293, -0.5337],
           [-0.4358, -0.3608, -0.4524,  ..., -0.3442, -0.4343, -0.3508],
           [-0.5317, -0.5728, -0.5352,  ..., -0.5469, -0.5273, -0.5386]],

          ...,

          [[ 1.6729,  1.7832,  1.6689,  ...,  1.7676,  1.6572,  1.7568],
           [ 1.8535,  1.8691,  1.8604,  ...,  1.8613,  1.8760,  1.8320],
           [ 1.6816,  1.7891,  1.6729,  ...,  1.7812,  1.6787,  1.7559],
           ...,
           [ 1.8281,  1.8496,  1.8311,  ...,  1.8311,  1.8359,  1.8086],
           [ 1.6758,  1.7832,  1.6523,  ...,  1.7500,  1.6406,  1.7402],
           [ 1.8066,  1.8389,  1.8398,  ...,  1.8281,  1.8359,  1.8008]],

          [[ 1.6436,  1.6270,  1.6504,  ...,  1.6289,  1.6533,  1.6240],
           [ 1.7275,  1.6943,  1.7227,  ...,  1.6904,  1.7334,  1.6826],
           [ 1.6406,  1.6191,  1.6465,  ...,  1.6240,  1.6553,  1.6162],
           ...,
           [ 1.7012,  1.6816,  1.7080,  ...,  1.6553,  1.7100,  1.6465],
           [ 1.6611,  1.6299,  1.6641,  ...,  1.6289,  1.6572,  1.6367],
           [ 1.7070,  1.6729,  1.7031,  ...,  1.6494,  1.7197,  1.6436]],

          [[ 1.5859,  1.5527,  1.5752,  ...,  1.5586,  1.5547,  1.5557],
           [ 1.5332,  1.6621,  1.5391,  ...,  1.6641,  1.4902,  1.6748],
           [ 1.5674,  1.5596,  1.5684,  ...,  1.5479,  1.5439,  1.5547],
           ...,
           [ 1.5156,  1.6367,  1.5176,  ...,  1.6611,  1.4619,  1.6660],
           [ 1.5566,  1.5576,  1.5693,  ...,  1.5518,  1.5439,  1.5420],
           [ 1.5137,  1.6328,  1.5205,  ...,  1.6514,  1.4697,  1.6602]]]],
        device='cuda:0', dtype=torch.float16),)

image

yiyixuxu commented 5 months ago

it's because we expect latents to be a tensor https://github.com/huggingface/diffusers/blob/f96e4a16adb4c31bab4c0a3d0d145ed2b086ecb0/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L635C9-L635C16

s9anus98a commented 5 months ago

it's because we expect latents to be a tensor https://github.com/huggingface/diffusers/blob/f96e4a16adb4c31bab4c0a3d0d145ed2b086ecb0/src/diffusers/pipelines/stable_diffusion_3/pipeline_stable_diffusion_3.py#L635C9-L635C16

so it should output tensor latent if set output_type='latent'

inv_latents is from here:


 inv_latents= pipe(prompt="", negative_prompt="", guidance_scale=1.,
                          width=input_img.shape[-1], height=input_img.shape[-2],
                          output_type='latent', return_dict=False,
                          num_inference_steps=num_steps, latents=latents)
asomoza commented 5 months ago

Hi, if you use output_type="latent" the pipeline returns a tensor.

Can you please post a reproducible code snippet with all the relevant parts of what you're doing? It's hard for us to help if you only provide parts of it.

s9anus98a commented 5 months ago

Hi @asomoza @yiyixuxu here's the full reproducible code

SD3 DDIM Inversion https://colab.research.google.com/drive/1B0qGpwsEjpOm3xx_LzraHWYehTTgB8AL

asomoza commented 5 months ago

I took a look at your code, since you're using return_dict=False in the fist generation, for the second one you'll need to pass the latents like this:

image = pipe(prompt="", negative_prompt="", guidance_scale=1.,
                     num_inference_steps=20, latents=inv_latents[0])
github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.