When I use the command sh ./script/train_semantic_VOC.sh to initiate training.
In train_semantic_voc.py, the latents obtained from latents = vae.encode(image.to(device)).latent_dist.sample().detach() have the dimensions ([1, 4, 32, 32]).
In the function call:
images_here, x_t = ptp_utils.text2image(unet, vae, tokenizer, text_encoder, noise_scheduler, prompts, controller, latent=start_code, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=True)
the parameter latent=start_code has the dimensions ([1, 4, 32, 32]).
Eventually, an error occurs in the function def init_latent(latent, unet, height, width, generator, batch_size) in ptp_utils.py with the following message:
RuntimeError: The expanded size of the tensor (64) must match the existing size (32) at non-singleton dimension 3. Target sizes: [1, 4, 64, 64]. Tensor sizes: [1, 4, 32, 32]
Is it expected that the dimensions of latents obtained from vae.encode(image.to(device)).latent_dist.sample().detach() should be ([1, 4, 64, 64])?
When I use the command sh ./script/train_semantic_VOC.sh to initiate training.
In train_semantic_voc.py, the latents obtained from latents = vae.encode(image.to(device)).latent_dist.sample().detach() have the dimensions ([1, 4, 32, 32]).
In the function call: images_here, x_t = ptp_utils.text2image(unet, vae, tokenizer, text_encoder, noise_scheduler, prompts, controller, latent=start_code, num_inference_steps=NUM_DIFFUSION_STEPS, guidance_scale=5, generator=g_cpu, low_resource=LOW_RESOURCE, Train=True) the parameter latent=start_code has the dimensions ([1, 4, 32, 32]).
Eventually, an error occurs in the function def init_latent(latent, unet, height, width, generator, batch_size) in ptp_utils.py with the following message: RuntimeError: The expanded size of the tensor (64) must match the existing size (32) at non-singleton dimension 3. Target sizes: [1, 4, 64, 64]. Tensor sizes: [1, 4, 32, 32]
Is it expected that the dimensions of latents obtained from vae.encode(image.to(device)).latent_dist.sample().detach() should be ([1, 4, 64, 64])?
Sorry for bothering you these time.