guoqincode / Open-AnimateAnyone

Unofficial Implementation of Animate Anyone
2.9k stars 233 forks source link

Question: Why is CFG not set during training but set during inference? #14

Closed uk9921 closed 9 months ago

uk9921 commented 9 months ago

Operations such as cfg_random_null_text or cfg_random_null_ref were not used during the training phase, but guidance_scale: 7.5 and do_classifier_free_guidance=True was set during inference. Is this as expected?

guoqincode commented 9 months ago

feel free to modify the Dataset or train policy, the current version is only a basic implementation

uk9921 commented 9 months ago

Thanks for the reply. Is there any progress in your experiment? I can only generate very scary images now. image

guoqincode commented 9 months ago

HMMM... Something seems wrong. Can you share some of your training information with me?

uk9921 commented 9 months ago

I used the main code you gave me and ran it on 5k videos for over 50 epochs. I’m trying hard to figure out why it went wrong. I also want to check if your tests worked out well.

guoqincode commented 9 months ago

Did you use cfg_random_null_ref during training? I previously worked with xformers and mixed precision when training on RTX 3090/4090 Gpus and there was a problem with parameters not updating. If you are an RTX series GPU, I recommend turning off xformers and mixed precision training.

uk9921 commented 9 months ago

No, actually, I have no idea how to enable cfg_random_null_ref for image embedding during training and I also disable it in the inference pipeline. And I will try to disable the xformers and fp16, thanks for your advice!

mark the result :D image

guoqincode commented 9 months ago

Now that I can get the normal results of the first phase of training, I suggest that you can test your training on a video to see if it is normal. In the meantime, I recommend using float32 for both reasoning and training.

guoqincode commented 9 months ago

If you are Chinese, you can send me an email: guoqin@stu.pku.edu.cn, we can add wechat.

uk9921 commented 9 months ago

Why not? Let's wechat :D

zhenzhiwang commented 9 months ago

@uk9921 Have you solved this problem? I also encountered this problem when I trying to train a model in UBC dataset in stage 1. Is the reason is xformer and mixed precision training?

uk9921 commented 9 months ago

@zhenzhiwang Hi, My problem is not enough training. You can try to turn off xformer and mixed precision and simply train to verify the difference.

zhenzhiwang commented 9 months ago

@zhenzhiwang Hi, My problem is not enough training. You can try to turn off xformer and mixed precision and simply train to verify the difference.

Hi @uk9921, thanks for your quick reply. I tested the 20000 iteration ckpt in UBC dataset and get similar images with https://github.com/guoqincode/AnimateAnyone-unofficial/issues/14#issuecomment-1855521920. Is it normal to get such meaningless images and suddenly become better in 30000 iterations (since that animate anyone paper reports that they only train 30k iterations)?

uk9921 commented 9 months ago

Hi @zhenzhiwang, honestly it's hard to say, so many setup details and I have little experience with them. Data set, lr, epochs, modules to be trained, and lr of each module will significantly affect the final result.

Besides training, you can also debug the inference pipeline and make sure the inputs are pre-processed as expected. For example, after resize&crop, are there some hard input case such as half-body reference and full-body poses?