Closed tdlhyj closed 8 months ago
Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.
Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.
grid.mp4
Okay, I'll give it a try. how many steps did you train for this
2000 iterations for bs 40, lr 5e-6, resuming from a converged checkpoint with "plus condition"
Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.
grid.mp4
Great! I tried this scheme before with Stage 1 training, but I didn't train with stage2.
noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)
Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.
grid.mp4
one projection layer added for unet?
or change input channel of unet = 8?
correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.
correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.
both unet and referencenet is modified?
and why changed to plus condition again?
correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.
correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.
both unet and referencenet is modified?
The referencenet works pretty well, so I didn't touch it.
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
@tgxs002 did you trained the "conv_in.weight" ?
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002
In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002
In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.
what about your stage 2 results , the results of UBC is bad background in my experiment
what about your stage 2 results , the results of UBC is bad background in my experiment
Yes, I also noticed the issue. Overfitting stage 1 helps, although it is not a scalable solution.
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002
In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.
Great! my input resolution set as 512x768, work no way. But set as 512x512, it work very quickly.
have you guys encountered the issue that utilizing the clip feature makes the color in the generated image weird like the patch in the cloth, after removing the clip feature the result becomes normal...
@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?
I didn't train with 10% dropout, and it doesn't seem critical for inference.
Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002
In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.
what about your stage 2 results , the results of UBC is bad background in my experiment
@lbwang2006 Hi Lb, I wanted to discuss this issue, but couldn't find a way to contact you. Please let me know if I could potentially have your email or you could maybe email me with the email in my bio.
The first stage, no matter how many training steps are taken, cannot achieve correct pose control