Open GitYesm opened 1 year ago
Looking forward to your reply
Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?
The train process as follows:
<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▋ | 8259/18075 [1:34:07<1:51:14, 1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▊ | 8269/18075 [1:34:15<1:51:27, 1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▊ | 8279/18075 [1:34:21<1:51:37, 1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▉ | 8289/18075 [1:34:29<1:51:56, 1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▉ | 8299/18075 [1:34:36<1:52:12, 1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000 46%|███████████████████████████████████████████████████ | 8309/18075 [1:34:41<1:52:00, 1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000 46%|███████████████████████████████████████████████████ | 8319/18075 [1:34:47<1:51:03, 1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▏ | 8329/18075 [1:34:56<1:50:40, 1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▏ | 8339/18075 [1:35:01<1:50:31, 1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▎ | 8349/18075 [1:35:08<1:50:59, 1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▎ | 8359/18075 [1:35:15<1:49:04, 1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▍ | 8369/18075 [1:35:23<1:50:03, 1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▍ | 8379/18075 [1:35:30<1:50:01, 1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▌ | 8389/18075 [1:35:37<1:50:02, 1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▌ | 8399/18075 [1:35:43<1:50:03, 1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▋ | 8409/18075 [1:35:49<1:48:46, 1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▋ | 8419/18075 [1:35:57<1:49:31, 1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▊ | 8429/18075 [1:36:04<1:49:48, 1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▊ | 8438/18075 [1:36:10<1:49:44, 1.46it/s]
The valid process as follows:
2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0> 2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0> 2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2> 2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3> 2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3> 2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3> 2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3> 2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>
It seems something wrong.
Model training doesn't look very stable ,especially for the SoundStream model, which adds more discriminators in attempt to improve quality, it also makes training more difficult.
I think some ideas provided by RVQ-GAN can improve your model
Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?
The train process as follows:
<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▋ | 8259/18075 [1:34:07<1:51:14, 1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▊ | 8269/18075 [1:34:15<1:51:27, 1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▊ | 8279/18075 [1:34:21<1:51:37, 1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▉ | 8289/18075 [1:34:29<1:51:56, 1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000 46%|██████████████████████████████████████████████████▉ | 8299/18075 [1:34:36<1:52:12, 1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000 46%|███████████████████████████████████████████████████ | 8309/18075 [1:34:41<1:52:00, 1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000 46%|███████████████████████████████████████████████████ | 8319/18075 [1:34:47<1:51:03, 1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▏ | 8329/18075 [1:34:56<1:50:40, 1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▏ | 8339/18075 [1:35:01<1:50:31, 1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▎ | 8349/18075 [1:35:08<1:50:59, 1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▎ | 8359/18075 [1:35:15<1:49:04, 1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▍ | 8369/18075 [1:35:23<1:50:03, 1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▍ | 8379/18075 [1:35:30<1:50:01, 1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▌ | 8389/18075 [1:35:37<1:50:02, 1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000 46%|███████████████████████████████████████████████████▌ | 8399/18075 [1:35:43<1:50:03, 1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▋ | 8409/18075 [1:35:49<1:48:46, 1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▋ | 8419/18075 [1:35:57<1:49:31, 1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▊ | 8429/18075 [1:36:04<1:49:48, 1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000 47%|███████████████████████████████████████████████████▊ | 8438/18075 [1:36:10<1:49:44, 1.46it/s]
The valid process as follows:
2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0> 2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0> 2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2> 2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3> 2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3> 2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3> 2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3> 2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>
It seems something wrong.
Model training doesn't look very stable ,especially for the SoundStream model, which adds more discriminators in attempt to improve quality, it also makes training more difficult.
In addition,I think some ideas provided by RVQ-GAN can improve your model
Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?
The train process as follows:
The valid process as follows: