primepake / wav2lip_288x288

MIT License
574 stars 150 forks source link

Percep: 0.0 | Fake: 100.0, Real: 0.0 #29

Closed aishoot closed 2 years ago

aishoot commented 2 years ago

@primepake Hello, thanks for your nice work. I have encountered some difficulties in training on my own dataset (followed your data preparation suggestions) using your sharing code recently. While I do python hq_wav2lip_train.py, training log is:

use_cuda: True Load 2687 audio feats. use_cuda: True, MULTI_GPU: True total trainable params 48520755 total DISC trainable params 18210561 Load checkpoint from: checkpoint_syn/checkpoint_step000171000.pth Starting Epoch: 0 Saved checkpoint: checkpoint_step000000001.pth Saved checkpoint: disc_checkpoint_step000000001.pth L1: 0.2313518226146698, Sync: 0.0, Percep: 0.711134135723114 | Fake: 0.6754781603813171, Real: 0.711134135723114 L1: 0.21765484660863876, Sync: 0.0, Percep: 0.709110289812088 | Fake: 0.677438884973526, Real: 0.709110289812088 L1: 0.2188651313384374, Sync: 0.0, Percep: 0.7070923844973246 | Fake: 0.6794042587280273, Real: 0.7070923844973246 L1: 0.2144552432000637, Sync: 0.0, Percep: 0.7049891352653503 | Fake: 0.6814645230770111, Real: 0.7049891352653503 L1: 0.21138261258602142, Sync: 0.0, Percep: 0.7029658198356629 | Fake: 0.6834565877914429, Real: 0.702965784072876 L1: 0.20817621052265167, Sync: 0.0, Percep: 0.7010621925195059 | Fake: 0.6853393216927847, Real: 0.7010620832443237 L1: 0.20210737415722438, Sync: 0.0, Percep: 0.6996434501239231 | Fake: 0.6867433360644749, Real: 0.6996431180409023 L1: 0.19812600128352642, Sync: 0.0, Percep: 0.6987411752343178 | Fake: 0.6876341179013252, Real: 0.6987397372722626 L1: 0.19437309437327915, Sync: 0.0, Percep: 0.6981025603082445 | Fake: 0.6882637408044603, Real: 0.6980936461024814 ... L1: 0.12470868316135908, Sync: 0.0, Percep: 0.703860961136065 | Fake: 0.7219482924593122, Real: 0.69152611988952 L1: 0.12432418142755826, Sync: 0.0, Percep: 0.7042167019098997 | Fake: 0.7212010175765803, Real: 0.6919726772157446 L1: 0.1240154696033173, Sync: 0.0, Percep: 0.7046547470633516 | Fake: 0.7203877433827243, Real: 0.6924839418497868 L1: 0.12360538116523198, Sync: 0.0, Percep: 0.7050436321569948 | Fake: 0.7196274266711303, Real: 0.6929315188026521 L1: 0.12324049579675751, Sync: 0.0, Percep: 0.7055533739051434 | Fake: 0.7187670722904832, Real: 0.6934557074776173 Evaluating for 300 steps L1: 0.08894559927284718, Sync: 7.633765455087026, Percep: 0.8102987110614777 | Fake: 0.5884036968151728, Real: 0.7851572235425314 L1: 0.12352411426603795, Sync: 0.0, Percep: 0.7059701490402222 | Fake: 0.7179982627183199, Real: 0.6938216637895676 L1: 0.12316158214713087, Sync: 0.0, Percep: 0.7069740667201505 | Fake: 0.7167386501879975, Real: 0.6947587119988258 L1: 0.12274648358716685, Sync: 0.0, Percep: 0.7086389707584008 | Fake: 0.7149891035960001, Real: 0.6961143705702852 L1: 0.122441600682666, Sync: 0.0, Percep: 0.7101659479650478 | Fake: 0.7133496456499239, Real: 0.6971959297446922 L1: 0.12237166498716061, Sync: 0.0, Percep: 0.7136020838068082 | Fake: 0.7105454275957667, Real: 0.6988249019290939 L1: 0.12226668624650865, Sync: 0.0, Percep: 0.7165913383165995 | Fake: 0.7080283074861481, Real: 0.6995955426850179 ... L1: 0.10978432702055822, Sync: 0.0, Percep: 0.7658024462613563 | Fake: 0.8152413980393286, Real: 0.6154363644432882 L1: 0.10972586760557995, Sync: 0.0, Percep: 0.7692448822380323 | Fake: 0.8124342786812339, Real: 0.6124296340577328 L1: 0.10953026241862897, Sync: 0.0, Percep: 0.7743397405519289 | Fake: 0.8092221762695191, Real: 0.610396875484025 L1: 0.10939421248741639, Sync: 0.0, Percep: 0.7824273446941963 | Fake: 0.8055855450166116, Real: 0.6093728619629893 L1: 0.1091919630309757, Sync: 0.0, Percep: 0.7908333182386574 | Fake: 0.8019456527194126, Real: 0.6076706493660488 L1: 0.10912049508790679, Sync: 0.0, Percep: 0.7942769879668473 | Fake: 0.7993013052296446, Real: 0.6045908347347031 L1: 0.10908047136182737, Sync: 0.0, Percep: 0.7925851992034527 | Fake: 0.8665490227044159, Real: 0.6015381057315442 L1: 0.10930284824053846, Sync: 0.0, Percep: 0.7917964988368816 | Fake: 0.8661491764380034, Real: 0.6038172583345233 Evaluating for 300 steps L1: 0.5731244529287021, Sync: 9.090377567211787, Percep: 1.654488068819046 | Fake: 0.21219109917680423, Real: 1.4065884272257487 L1: 0.10955245170742273, Sync: 0.0, Percep: 0.9697534098973847 | Fake: 0.8618184333870491, Real: 0.707599309967045 L1: 0.10959959211782437, Sync: 0.0, Percep: 0.9730155654561438 | Fake: 0.8586212793076295, Real: 0.7107085656626347 L1: 0.1096739404936238, Sync: 0.0, Percep: 0.9733813980183366 | Fake: 0.8565126432325659, Real: 0.7121740724053752 L1: 0.10970133425566951, Sync: 0.0, Percep: 0.9722086560893506 | Fake: 0.8555087234860062, Real: 0.7122941125653687 L1: 0.10975395110161866, Sync: 0.0, Percep: 0.9709689952921027 | Fake: 0.8545878388406093, Real: 0.7123311641033997 L1: 0.10966527403854742, Sync: 0.0, Percep: 0.9697017977636012 | Fake: 0.8537138932196765, Real: 0.7123305465982057 ... L1: 0.10242784435932453, Sync: 0.0, Percep: 0.8304814692491738 | Fake: 10.244699917241833, Real: 0.6199986526972722 L1: 0.10239110164847111, Sync: 0.0, Percep: 0.8279339800796978 | Fake: 10.520022923630663, Real: 0.618096816339305 L1: 0.10235720289591985, Sync: 0.0, Percep: 0.8254020718837354 | Fake: 10.793661997258704, Real: 0.6162066120079922 L1: 0.10230096286480747, Sync: 0.0, Percep: 0.8228856021523825 | Fake: 11.065632539949988, Real: 0.6143279333128459 L1: 0.10232478230738712, Sync: 0.0, Percep: 0.8203844301093661 | Fake: 11.335949766272329, Real: 0.6124606751568799 Starting Epoch: 1 L1: 0.11442705243825912, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.09390852972865105, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.09028899172941844, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08729504607617855, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.0875200405716896, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08466161414980888, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.0851562459553991, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 ... L1: 0.08385955898174599, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08394778782830518, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08393304471088492, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.0836903992508139, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 Evaluating for 300 steps L1: 0.062434629226724304, Sync: 7.282701448599497, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08369095140779523, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.0834334861073229, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08353378695167907, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 L1: 0.08348599619962074, Sync: 0.0, Percep: 0.0 | Fake: 100.0, Real: 0.0 ...

Is the training process normal? If not, could you please give me some suggestions? Need your help sincerely.

ghost commented 2 years ago

this is failed training, gans sometimes hard to train. you need to deepdive in your dataset

aishoot commented 2 years ago

this is failed training, gans sometimes hard to train. you need to deepdive in your dataset

Any suggestions? Sincerely hope to get your reply.

ghost commented 2 years ago

process your dataset carefully, maybe this is a problem

aishoot commented 2 years ago

Thanks. In fact, there is more than one person on my dataset.

ghost commented 2 years ago

it should be more than 60 minutes/person

aishoot commented 2 years ago

I modified some data preprocessing codes. My training dataset is like this: 0

NikitaKononov commented 2 years ago

Try to play with learning rate (disc and model optimizers)

aishoot commented 2 years ago

@NikitaKononov What should I do?What about the details. Thanks

NikitaKononov commented 2 years ago

@NikitaKononov What should I do?What about the details. Thanks

You should change the learning rate :))

aishoot commented 2 years ago

@NikitaKononov What should I do?What about the details. Thanks

You should change the learning rate :))

I changed the syncnet_lr to 1e-3 using hq_wav2lip_train.py or wloss_hq_wav2lip_train.py, which still failed to train.

NikitaKononov commented 2 years ago

@NikitaKononov What should I do?What about the details. Thanks

You should change the learning rate :))

I changed the syncnet_lr to 1e-3 using hq_wav2lip_train.py or wloss_hq_wav2lip_train.py, which still failed to train.

Lower

ghost commented 2 years ago

it's not about your learning rate, it's about dataset. Your should has more than 10 persons to train

NikitaKononov commented 2 years ago

it's not about your learning rate, it's about dataset. Your should has more than 10 persons to train

Learning rate makes sense too On a large dataset with wring LR model failed to train in my case

ghost commented 2 years ago

it's not about your learning rate, it's about dataset. Your should has more than 10 persons to train

Learning rate makes sense too On a large dataset with wring LR model failed to train in my case

if your dataset is good, It take more than to get good result. if not it will stuck at some point

NikitaKononov commented 2 years ago

it's not about your learning rate, it's about dataset. Your should has more than 10 persons to train

Learning rate makes sense too On a large dataset with wring LR model failed to train in my case

if your dataset is good, It take more than to get good result. if not it will stuck at some point

Gan model is sensitive to learning rate, especially in wgan case

ghost commented 2 years ago

that's why we need gradient penalty

HandsLing commented 2 years ago

I use lrs2 datasets to train, also get Percep: 0.0 | Fake: 100.00000762939453, Real: 0.0 at the 3rd epoch, what should I do @primepake

NikitaKononov commented 2 years ago

I use lrs2 datasets to train, also get Percep: 0.0 | Fake: 100.00000762939453, Real: 0.0 at the 3rd epoch, what should I do @primepake

I don't recommend using lrs2, it has crappy resolution, but it won't lead to train failing I think

What code do you use? if wloss_hq_wav2lip_train.py don't even try, it won't work if hq_wav2lip_train.py try different learning rates

aishoot commented 2 years ago

I use lrs2 datasets to train, also get Percep: 0.0 | Fake: 100.00000762939453, Real: 0.0 at the 3rd epoch, what should I do @primepake

I don't recommend using lrs2, it has crappy resolution, but it won't lead to train failing I think

What code do you use? if wloss_hq_wav2lip_train.py don't even try, it won't work if hq_wav2lip_train.py try different learning rates

@NikitaKononov Any recommended learning rates?

NikitaKononov commented 2 years ago

@aishoot Depends on your batch size, Larger batch - higer LR Try to start with 1e-4 I tried LR from 1e-3 to 1e-7 (used 1e-7 in the last training stage to prevent overfitting)

aishoot commented 2 years ago

@NikitaKononov Thanks. I will have a try.

DogeFlow commented 1 year ago

Had the same problem, did you finally solve it?

huangxin168 commented 8 months ago

image my hq_wav2lip_train.py always got this result: Percep: 0.6985121191 | Fake: 0.6934193585, Real: 0.6940246867 Is it normal?