yisol / IDM-VTON

IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
https://idm-vton.github.io/
3.04k stars 461 forks source link

Color difference when replicate the training code #88

Open nftblackmagic opened 3 weeks ago

nftblackmagic commented 3 weeks ago

Hi there,

I recently replicate the training code. Came across a really interesting problem.

during_train_51_420876cfa402f05a601c

This is my overfitting inferencing test. I got the final results with color difference. It looks weird. The three rows has different guidance_scale = [0.99, 2, 5]

This is the inferencing of official weights. old_model_0_80daf7e0e4128e3b92a8

The only difference between two pictures is the unet module.

Does anyone have any insight into what might be causing the color difference?

nftblackmagic commented 2 weeks ago

The problem is still existing. Anyway here is my training implementation.

https://github.com/nftblackmagic/IDM-VTON-training

nom commented 1 week ago

@nftblackmagic Your results here look good tho without this color shift issue? https://wandb.ai/anzhangusc/train_controlnet/runs/v34c4bin?nw=nwuseranzhangusc

nftblackmagic commented 1 week ago

I used a different strength which improved the results. But the problem is still existing

AlexG1105 commented 6 days ago

My feeling is that the ip adapter may end up causing the image hue. I saw that you zero initialize the weights for ip adapter, but the paper for ip adapter seems to initialize it with the existing Wk Wv matrices for the regular cross attention. Perhaps that's one of the reasons. I am kind of lazy, so I still end up doing a random init. Not sure how long it takes you to train or the hardware you train on, but I have only recently started training. I'll let it train for a couple days and see what happens.

nom commented 4 days ago

Are you randomly dropping out the conditions during training? This is needed for CFG to work. Usually dropping out 10% of the time is recommended. Specifically, you need to zero out the reference features similar to what's done for the unconditional input in the tryon pipeline.