pixeli99 / SVD_Xtend

Stable Video Diffusion Training Code and Extensions.
482 stars 45 forks source link

Question for the encoder_hidden_states #28

Open WayneML opened 6 months ago

WayneML commented 6 months ago

When I try to run the script, I found the encoder_hidden_states to be zero.

WayneML commented 6 months ago

if args.conditioning_dropout_prob is not None: random_p = torch.rand( bsz, device=latents.device, generator=generator)

Sample masks for the edit prompts.

                prompt_mask = random_p < 2 * args.conditioning_dropout_prob
                prompt_mask = prompt_mask.reshape(bsz, 1, 1)
                # Final text conditioning.
                null_conditioning = torch.zeros_like(encoder_hidden_states)
                encoder_hidden_states = torch.where(
                    prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))

I found something strange in this code block,it seems that “random_p = torch.ran(bsz, device=latents.device, generator=generator)” always make random_p is one dimension and the value is 1.when you chose batch size is 1. make prompt_mask one ture but not a list of Boolean type. prompt_mask = random_p < 2 * args.conditioning_dropout_prob prompt_mask = prompt_mask.reshape(bsz, 1, 1)

Final text conditioning.

                null_conditioning = torch.zeros_like(encoder_hidden_states)
                encoder_hidden_states = torch.where(
                    prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))
WayneML commented 6 months ago

And is this still for image2video task? It seems that it is used for the text to image.

pixeli99 commented 6 months ago

Hi, I didn't quite understand what you meant. Are you asking why the encoder_hidden_states need to be replaced with zeros?

mmxbc1223 commented 5 months ago

Can the encoder_hidden_states be replaced with a text embedding for text-to-video tasks?