yangdongchao / AcademiCodec

AcademiCodec: An Open Source Audio Codec Model for Academic Research
574 stars 80 forks source link

The Encodec 24k_240 training loss are very large ! #27

Open GitYesm opened 1 year ago

GitYesm commented 1 year ago

Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?

The train process as follows:

<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▋                                                            | 8259/18075 [1:34:07<1:51:14,  1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8269/18075 [1:34:15<1:51:27,  1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8279/18075 [1:34:21<1:51:37,  1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8289/18075 [1:34:29<1:51:56,  1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8299/18075 [1:34:36<1:52:12,  1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8309/18075 [1:34:41<1:52:00,  1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8319/18075 [1:34:47<1:51:03,  1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8329/18075 [1:34:56<1:50:40,  1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8339/18075 [1:35:01<1:50:31,  1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8349/18075 [1:35:08<1:50:59,  1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8359/18075 [1:35:15<1:49:04,  1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8369/18075 [1:35:23<1:50:03,  1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8379/18075 [1:35:30<1:50:01,  1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8389/18075 [1:35:37<1:50:02,  1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8399/18075 [1:35:43<1:50:03,  1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8409/18075 [1:35:49<1:48:46,  1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8419/18075 [1:35:57<1:49:31,  1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8429/18075 [1:36:04<1:49:48,  1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8438/18075 [1:36:10<1:49:44,  1.46it/s]

The valid process as follows:

2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0>
2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0>
2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2>
2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3>
2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3>
2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3>
2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3>
2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>
GitYesm commented 1 year ago

Looking forward to your reply

yangdongchao commented 1 year ago

Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?

The train process as follows:

<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▋                                                            | 8259/18075 [1:34:07<1:51:14,  1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8269/18075 [1:34:15<1:51:27,  1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8279/18075 [1:34:21<1:51:37,  1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8289/18075 [1:34:29<1:51:56,  1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8299/18075 [1:34:36<1:52:12,  1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8309/18075 [1:34:41<1:52:00,  1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8319/18075 [1:34:47<1:51:03,  1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8329/18075 [1:34:56<1:50:40,  1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8339/18075 [1:35:01<1:50:31,  1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8349/18075 [1:35:08<1:50:59,  1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8359/18075 [1:35:15<1:49:04,  1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8369/18075 [1:35:23<1:50:03,  1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8379/18075 [1:35:30<1:50:01,  1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8389/18075 [1:35:37<1:50:02,  1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8399/18075 [1:35:43<1:50:03,  1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8409/18075 [1:35:49<1:48:46,  1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8419/18075 [1:35:57<1:49:31,  1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8429/18075 [1:36:04<1:49:48,  1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8438/18075 [1:36:10<1:49:44,  1.46it/s]

The valid process as follows:

2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0>
2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0>
2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2>
2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3>
2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3>
2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3>
2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3>
2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>

It seems something wrong.

GitYesm commented 1 year ago

Model training doesn't look very stable ,especially for the SoundStream model, which adds more discriminators in attempt to improve quality, it also makes training more difficult.

I think some ideas provided by RVQ-GAN can improve your model

Hi yangdongchao! When I train the Encodec 24k_240 in 1kbps during the early stages, the model exhibits very high loss and significant oscillation. Is this a normal phenomenon?

The train process as follows:

<epoch:8, iter:8250, total_loss_g:20.7092, adv_g_loss:2.1068, feat_loss:15.4339, rec_loss:3.1594, commit_loss:0.0000, loss_d:1.2053>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▋                                                            | 8259/18075 [1:34:07<1:51:14,  1.47it/s]<epoch:8, iter:8260, total_loss_g:1448.0029, adv_g_loss:2.0795, feat_loss:1439.2244, rec_loss:6.6940, commit_loss:0.0000, loss_d:0.5836>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8269/18075 [1:34:15<1:51:27,  1.47it/s]<epoch:8, iter:8270, total_loss_g:588.6943, adv_g_loss:2.1234, feat_loss:577.0657, rec_loss:9.4847, commit_loss:0.0000, loss_d:0.8170>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▊                                                            | 8279/18075 [1:34:21<1:51:37,  1.46it/s]<epoch:8, iter:8280, total_loss_g:316.6624, adv_g_loss:2.1950, feat_loss:306.5796, rec_loss:7.8813, commit_loss:0.0000, loss_d:0.8256>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8289/18075 [1:34:29<1:51:56,  1.46it/s]<epoch:8, iter:8290, total_loss_g:6425.9717, adv_g_loss:2.1269, feat_loss:6398.3364, rec_loss:25.5026, commit_loss:0.0000, loss_d:0.9661>, d_weight: 1.0000
 46%|██████████████████████████████████████████████████▉                                                            | 8299/18075 [1:34:36<1:52:12,  1.45it/s]<epoch:8, iter:8300, total_loss_g:2867.6846, adv_g_loss:2.2306, feat_loss:2847.7778, rec_loss:17.6676, commit_loss:0.0000, loss_d:0.1482>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8309/18075 [1:34:41<1:52:00,  1.45it/s]<epoch:8, iter:8310, total_loss_g:4510.4780, adv_g_loss:1.9837, feat_loss:4476.9551, rec_loss:31.5352, commit_loss:0.0000, loss_d:1.1329>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████                                                            | 8319/18075 [1:34:47<1:51:03,  1.46it/s]<epoch:8, iter:8320, total_loss_g:3507.8118, adv_g_loss:1.9984, feat_loss:3480.6077, rec_loss:25.1733, commit_loss:0.0000, loss_d:1.0020>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8329/18075 [1:34:56<1:50:40,  1.47it/s]<epoch:8, iter:8330, total_loss_g:17506.3809, adv_g_loss:1.9943, feat_loss:17494.1309, rec_loss:10.2544, commit_loss:0.0000, loss_d:0.8280>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▏                                                           | 8339/18075 [1:35:01<1:50:31,  1.47it/s]<epoch:8, iter:8340, total_loss_g:30781.5254, adv_g_loss:2.1298, feat_loss:30761.4688, rec_loss:17.8869, commit_loss:0.0000, loss_d:0.4086>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8349/18075 [1:35:08<1:50:59,  1.46it/s]<epoch:8, iter:8350, total_loss_g:361517.0312, adv_g_loss:2.1185, feat_loss:361338.4688, rec_loss:176.4266, commit_loss:0.0000, loss_d:0.2256>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▎                                                           | 8359/18075 [1:35:15<1:49:04,  1.48it/s]<epoch:8, iter:8360, total_loss_g:32.4452, adv_g_loss:2.1076, feat_loss:28.3426, rec_loss:1.9913, commit_loss:0.0000, loss_d:1.3850>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8369/18075 [1:35:23<1:50:03,  1.47it/s]<epoch:8, iter:8370, total_loss_g:304.8588, adv_g_loss:2.2852, feat_loss:299.7329, rec_loss:2.8386, commit_loss:0.0000, loss_d:1.0175>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▍                                                           | 8379/18075 [1:35:30<1:50:01,  1.47it/s]<epoch:8, iter:8380, total_loss_g:34873.7617, adv_g_loss:2.1054, feat_loss:34844.2266, rec_loss:27.4251, commit_loss:0.0000, loss_d:0.3069>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8389/18075 [1:35:37<1:50:02,  1.47it/s]<epoch:8, iter:8390, total_loss_g:40341.5039, adv_g_loss:2.2593, feat_loss:40214.2148, rec_loss:125.0235, commit_loss:0.0000, loss_d:0.6393>, d_weight: 1.0000
 46%|███████████████████████████████████████████████████▌                                                           | 8399/18075 [1:35:43<1:50:03,  1.47it/s]<epoch:8, iter:8400, total_loss_g:184210.6719, adv_g_loss:2.0305, feat_loss:184145.6875, rec_loss:62.9335, commit_loss:0.0000, loss_d:1.0710>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8409/18075 [1:35:49<1:48:46,  1.48it/s]<epoch:8, iter:8410, total_loss_g:1336.8246, adv_g_loss:2.1409, feat_loss:1317.9712, rec_loss:16.7082, commit_loss:0.0000, loss_d:0.9688>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▋                                                           | 8419/18075 [1:35:57<1:49:31,  1.47it/s]<epoch:8, iter:8420, total_loss_g:13977.8945, adv_g_loss:2.2973, feat_loss:13938.0557, rec_loss:37.5274, commit_loss:0.0000, loss_d:0.2749>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8429/18075 [1:36:04<1:49:48,  1.46it/s]<epoch:8, iter:8430, total_loss_g:3301.4082, adv_g_loss:2.1330, feat_loss:3262.6450, rec_loss:36.6189, commit_loss:0.0000, loss_d:0.6580>, d_weight: 1.0000
 47%|███████████████████████████████████████████████████▊                                                           | 8438/18075 [1:36:10<1:49:44,  1.46it/s]

The valid process as follows:

2023-06-20-12-58: <epoch:0, total_loss_g_valid:155.6049, recon_loss_valid:21.3568, adversarial_loss_valid:1.6380, feature_loss_valid:132.6101, commit_loss_valid:0.0000, valid_loss_d:1.2365, best_epoch:0>
2023-06-20-16-30: <epoch:1, total_loss_g_valid:508.1316, recon_loss_valid:21.7350, adversarial_loss_valid:1.7627, feature_loss_valid:484.6339, commit_loss_valid:0.0000, valid_loss_d:1.0418, best_epoch:0>
2023-06-20-20-02: <epoch:2, total_loss_g_valid:302.2671, recon_loss_valid:20.5088, adversarial_loss_valid:2.1077, feature_loss_valid:279.6506, commit_loss_valid:0.0000, valid_loss_d:1.1599, best_epoch:2>
2023-06-20-23-34: <epoch:3, total_loss_g_valid:1090.3598, recon_loss_valid:20.4632, adversarial_loss_valid:2.0897, feature_loss_valid:1067.8068, commit_loss_valid:0.0000, valid_loss_d:0.9414, best_epoch:3>
2023-06-21-03-07: <epoch:4, total_loss_g_valid:1666.9553, recon_loss_valid:21.7679, adversarial_loss_valid:2.0294, feature_loss_valid:1643.1580, commit_loss_valid:0.0000, valid_loss_d:1.0660, best_epoch:3>
2023-06-21-06-39: <epoch:5, total_loss_g_valid:1438.0695, recon_loss_valid:21.1533, adversarial_loss_valid:2.1540, feature_loss_valid:1414.7622, commit_loss_valid:0.0000, valid_loss_d:1.1304, best_epoch:3>
2023-06-21-10-11: <epoch:6, total_loss_g_valid:918.1003, recon_loss_valid:21.4004, adversarial_loss_valid:2.1242, feature_loss_valid:894.5757, commit_loss_valid:0.0000, valid_loss_d:1.1136, best_epoch:3>
2023-06-21-13-43: <epoch:7, total_loss_g_valid:1691.1200, recon_loss_valid:20.3575, adversarial_loss_valid:2.1024, feature_loss_valid:1668.6601, commit_loss_valid:0.0000, valid_loss_d:0.9036, best_epoch:7>

It seems something wrong.

Model training doesn't look very stable ,especially for the SoundStream model, which adds more discriminators in attempt to improve quality, it also makes training more difficult.

In addition,I think some ideas provided by RVQ-GAN can improve your model