I want to know if anyone has correctly trained the first stage？

tdlhyj commented 8 months ago

The first stage, no matter how many training steps are taken, cannot achieve correct pose control

tgxs002 commented 8 months ago

Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.

https://github.com/guoqincode/AnimateAnyone-unofficial/assets/38932123/e2e30546-a923-4b2b-8e46-150997c1ba70

tdlhyj commented 8 months ago

Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.

grid.mp4

Okay, I'll give it a try. how many steps did you train for this

tgxs002 commented 8 months ago

2000 iterations for bs 40, lr 5e-6, resuming from a converged checkpoint with "plus condition"

guoqincode commented 8 months ago

Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.

grid.mp4

Great! I tried this scheme before with Stage 1 training, but I didn't train with stage2.

lbwang2006 commented 8 months ago

noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)

Same issue. You can try replacing "noisy_latents = noisy_latents + latents_pose" by "noisy_latents = torch.cat([noisy_latents, latents_pose], dim=1)", (also related codes accordingly) and then it should work.

grid.mp4

one projection layer added for unet？

lbwang2006 commented 8 months ago

or change input channel of unet = 8？

tgxs002 commented 8 months ago

correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.

lbwang2006 commented 8 months ago

correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.

both unet and referencenet is modified?

lbwang2006 commented 8 months ago

and why changed to plus condition again?

correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.

tgxs002 commented 8 months ago

correctly, I appended 4 channels of zeros to "conv_in.weight", and change the input channel to 8. similar to the implementation of runwayml/stable-diffusion-inpainting.

both unet and referencenet is modified?

The referencenet works pretty well, so I didn't touch it.

jaymefosa commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

tgxs002 commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

ireneMsm2020 commented 8 months ago

@tgxs002 did you trained the "conv_in.weight" ?

VisionU commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002

tgxs002 commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002

In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.

lbwang2006 commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002

In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.

what about your stage 2 results ， the results of UBC is bad background in my experiment

tgxs002 commented 8 months ago

what about your stage 2 results ， the results of UBC is bad background in my experiment

Yes, I also noticed the issue. Overfitting stage 1 helps, although it is not a scalable solution.

VisionU commented 8 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002

In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.

Great! my input resolution set as 512x768, work no way. But set as 512x512, it work very quickly.

garychan22 commented 7 months ago

have you guys encountered the issue that utilizing the clip feature makes the color in the generated image weird like the patch in the cloth, after removing the clip feature the result becomes normal...

newcherryberry commented 2 months ago

@tgxs002 did you also do a 10% dropout during training, with an inference CFG like a cat([zeros_like(pose_latents), pose_latents]) ?

I didn't train with 10% dropout, and it doesn't seem critical for inference.

Could you share you stage1 weights? Thanks. I concat(noise,pose_latent) but the texture is not good。 @tgxs002

In my experiments, the released code works pretty well for generating texture. You may find it helpful to check your input resolution.

what about your stage 2 results ， the results of UBC is bad background in my experiment

@lbwang2006 Hi Lb, I wanted to discuss this issue, but couldn't find a way to contact you. Please let me know if I could potentially have your email or you could maybe email me with the email in my bio.

guoqincode / Open-AnimateAnyone

I want to know if anyone has correctly trained the first stage？ #56