Error generating image when num_vectors_per_token>1

xiankgx commented 2 years ago

Hi, I tried to generate using the learned embeddings, however, I'm faced with the following issues during generation. I used the personalized_style versions of the dataset PersonalizedBase for training.

Traceback (most recent call last):
  File "scripts/txt2img.py", line 145, in <module>
    uc = model.get_learned_conditioning(opt.n_samples * [""])
  File "/workspace/textual_inversion/ldm/models/diffusion/ddpm.py", line 594, in get_learned_conditioning
    c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
  File "/workspace/textual_inversion/ldm/modules/encoders/modules.py", line 124, in encode
    return self(text, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/textual_inversion/ldm/modules/encoders/modules.py", line 119, in forward
    z = self.transformer(tokens, return_embeddings=True, embedding_manager=embedding_manager)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/textual_inversion/ldm/modules/x_transformer.py", line 615, in forward
    x = embedding_manager(x, embedded_x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/textual_inversion/ldm/modules/embedding_manager.py", line 101, in forward
    embedded_text[placeholder_idx] = placeholder_embedding
RuntimeError: shape mismatch: value tensor of shape [4, 1280] cannot be broadcast to indexing result of shape [0, 1280]

rinongal commented 2 years ago

You need to update the number of vectors in the inference config (configs/stable-diffusion/v1-inference.yaml / configs/latent-diffusion/txt2img-1p4B-eval_with_tokens.yaml) to match what you used in training.

It looks like you're using LDM and 4 vectors, so you'll need to add: num_vectors_per_token: 4 Under the params of the personalization_config block in the latent-diffusion yaml

xiankgx commented 2 years ago

You need to update the number of vectors in the inference config (configs/stable-diffusion/v1-inference.yaml / configs/latent-diffusion/txt2img-1p4B-eval_with_tokens.yaml) to match what you used in training.

It looks like you're using LDM and 4 vectors, so you'll need to add: num_vectors_per_token: 4 Under the params of the personalization_config block in the latent-diffusion yaml

I figured it out, but I'm thinking perhaps the better solution to this is to save the embeddings manager state together with the embeddings and reload when the embeddings state dict is restored. Anyway, thank you for your lovely work and quick response.

rinongal commented 2 years ago

You're absolutely correct, but this requires a bit of delving into the initialization code of LDM which I did not want to do :) If you make that change for yourself and want to submit it as a PR, I'd be happy to merge it.

Fun-Cry commented 1 year ago

Hi @rinongal I was using stable diffusion v1, but whenever I chagne num_vectors_per_tokens to more than 1, the following error pops out RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Is there a way to avoid this?

rinongal commented 1 year ago

@Fun-Cry Hey. Just to make sure: Are you using the implementation from our repo? Did you change anything? And are you still using the regular prompts?

Also: Which config are you using? v1-finetune or v1-finetune_unfrozen?

Fun-Cry commented 1 year ago

@rinongal Hi, thanks for replying I'm using this exact repo with v1-finetune config. I just cloned it again today so the content should be the same except for the num_vector_per_token (I also change the batch size to 1 due to the gpu problems), but I still got runtime error. I don't know what regular prompts are. Does it affect the result?

LilyDaytoy commented 1 year ago

Also encounter RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when trying to implemet it on stabel diffusion v1, even got this error when setting num_vectors_per_tokens to 1, is there anything I need to take care of? Can anyone help me about this pleaaaaase

LilyDaytoy commented 1 year ago

Hi! I also want to try this on sdm_v1, and even when I set num_vectors_per_token=1, I also got this runtime error, I think in ddpm.py, I have already make all cond_stage_model, and unet model's param to requires_grad = False (and all the eval(), disabled_train thing), and set embedding_manager.params.requires_grad = True; I also set optimizer's params only be the embedding_manager.params(), why I still got this gradient update error, is there anything else I need to take notice for gradients in the code?

Hi @rinongal I was using stable diffusion v1, but whenever I chagne num_vectors_per_tokens to more than 1, the following error pops out RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Is there a way to avoid this?

WuTao-CS commented 1 year ago

Hi! I also want to try this on sdm_v1, and even when I set num_vectors_per_token=1, I also got this runtime error, I think in ddpm.py, I have already make all cond_stage_model, and unet model's param to requires_grad = False (and all the eval(), disabled_train thing), and set embedding_manager.params.requires_grad = True; I also set optimizer's params only be the embedding_manager.params(), why I still got this gradient update error, is there anything else I need to take notice for gradients in the code?

Hi @rinongal I was using stable diffusion v1, but whenever I chagne num_vectors_per_tokens to more than 1, the following error pops out RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Is there a way to avoid this?

Hi! I have the same problem as you. Have you solved it？ Thank you!

rinongal / textual_inversion

Error generating image when num_vectors_per_token>1 #115