Closed uk9921 closed 9 months ago
feel free to modify the Dataset or train policy, the current version is only a basic implementation
Thanks for the reply. Is there any progress in your experiment? I can only generate very scary images now.
HMMM... Something seems wrong. Can you share some of your training information with me?
I used the main code you gave me and ran it on 5k videos for over 50 epochs. I’m trying hard to figure out why it went wrong. I also want to check if your tests worked out well.
Did you use cfg_random_null_ref during training? I previously worked with xformers and mixed precision when training on RTX 3090/4090 Gpus and there was a problem with parameters not updating. If you are an RTX series GPU, I recommend turning off xformers and mixed precision training.
No, actually, I have no idea how to enable cfg_random_null_ref for image embedding during training and I also disable it in the inference pipeline. And I will try to disable the xformers and fp16, thanks for your advice!
mark the result :D
Now that I can get the normal results of the first phase of training, I suggest that you can test your training on a video to see if it is normal. In the meantime, I recommend using float32 for both reasoning and training.
If you are Chinese, you can send me an email: guoqin@stu.pku.edu.cn, we can add wechat.
Why not? Let's wechat :D
@uk9921 Have you solved this problem? I also encountered this problem when I trying to train a model in UBC dataset in stage 1. Is the reason is xformer and mixed precision training?
@zhenzhiwang Hi, My problem is not enough training. You can try to turn off xformer and mixed precision and simply train to verify the difference.
@zhenzhiwang Hi, My problem is not enough training. You can try to turn off xformer and mixed precision and simply train to verify the difference.
Hi @uk9921, thanks for your quick reply. I tested the 20000 iteration ckpt in UBC dataset and get similar images with https://github.com/guoqincode/AnimateAnyone-unofficial/issues/14#issuecomment-1855521920. Is it normal to get such meaningless images and suddenly become better in 30000 iterations (since that animate anyone paper reports that they only train 30k iterations)?
Hi @zhenzhiwang, honestly it's hard to say, so many setup details and I have little experience with them. Data set, lr, epochs, modules to be trained, and lr of each module will significantly affect the final result.
Besides training, you can also debug the inference pipeline and make sure the inputs are pre-processed as expected. For example, after resize&crop, are there some hard input case such as half-body reference and full-body poses?
Operations such as
cfg_random_null_text
orcfg_random_null_ref
were not used during the training phase, butguidance_scale: 7.5
anddo_classifier_free_guidance=True
was set during inference. Is this as expected?