Open WayneML opened 6 months ago
if args.conditioning_dropout_prob is not None: random_p = torch.rand( bsz, device=latents.device, generator=generator)
prompt_mask = random_p < 2 * args.conditioning_dropout_prob
prompt_mask = prompt_mask.reshape(bsz, 1, 1)
# Final text conditioning.
null_conditioning = torch.zeros_like(encoder_hidden_states)
encoder_hidden_states = torch.where(
prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))
I found something strange in this code block,it seems that “random_p = torch.ran(bsz, device=latents.device, generator=generator)” always make random_p is one dimension and the value is 1.when you chose batch size is 1. make prompt_mask one ture but not a list of Boolean type. prompt_mask = random_p < 2 * args.conditioning_dropout_prob prompt_mask = prompt_mask.reshape(bsz, 1, 1)
null_conditioning = torch.zeros_like(encoder_hidden_states)
encoder_hidden_states = torch.where(
prompt_mask, null_conditioning.unsqueeze(1), encoder_hidden_states.unsqueeze(1))
And is this still for image2video task? It seems that it is used for the text to image.
Hi, I didn't quite understand what you meant. Are you asking why the encoder_hidden_states
need to be replaced with zeros
?
Can the encoder_hidden_states be replaced with a text embedding for text-to-video tasks?
When I try to run the script, I found the encoder_hidden_states to be zero.