levihsu / OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Other
5.53k stars 809 forks source link

The prerprocess of input image for the Unet in training process #119

Closed liupengcnu closed 7 months ago

liupengcnu commented 7 months ago

Hi, Dr Xu, the following is my data preprocessing and training code when training ootdiffusion. Please check where there is any problem.

        if random.uniform(0, 1) < 0.1:
            garm_latents = torch.zeros(garm_latents.shape)

        noise = torch.randn_like(image_ori_latents)
        noisy_latents = noise + image_ori_latents

        _, spatial_attn_outputs = unet_garm(
            garm_latents,
            0,
            encoder_hidden_states=prompt_embeds,
        )

        latent_vton_model_input = torch.cat([noisy_latents, vton_latents], dim=1)
        noise_pred = unet_vton(
            latent_vton_model_input,
            spatial_attn_outputs,
            timesteps,
            encoder_hidden_states=prompt_embeds,
        ).sample

        loss = F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
liupengcnu commented 7 months ago

@levihsu Could you please help me point out the error in the above code?

liupengcnu commented 7 months ago

@levihsu Hi, Dr Xu, could you give me some guidance for the above code?

liupengcnu commented 7 months ago

@T-Gu @ShineChen1024 Could you give me some guidance for the above code?

levihsu commented 7 months ago

@liupengcnu Hi. We really appreciate your enthusiasm for this project. But we don't have time to answer so many too detailed questions of yours one by one. We will release the training code later. Thanks for your understanding : )