kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.54k stars 339 forks source link

The learning rate decreases after a specific epoch. #425

Closed Cyp9715 closed 6 months ago

Cyp9715 commented 6 months ago

From approximately 2000 epoch, it is learned at a fast rate of 5 epoch per second based on RTX3070, and then suddenly the speed drops sharply. I'm train parallel-waveGAN, and I've modified some of the datasets and conf files to replace them with other languages. Starting with 2000 epoch, I start reading SSD like crazy instead of GPU resources. I waited for more than 6 hours, but only five epoch progressed.

Is this intended? Thank you.

Below is the information about the setting file that I modified.

FILE : conf/parallel_wavegan.v3.yaml

###########################################################
#                FEATURE EXTRACTION SETTING               #
###########################################################
sampling_rate: 16000     # Sampling rate.
fft_size: 400           # FFT size.
hop_size: 160            # Hop size.
win_length: 400         # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 80                 # Minimum freq in mel basis calculation.
fmax: 7600               # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0   # Will be multiplied to all of waveform.
trim_silence: true       # Whether to trim the start and end of silence.
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.
trim_frame_size: 2048    # Frame size in trimming.
trim_hop_size: 512       # Hop size in trimming.
format: "hdf5"           # Feature file format. "npy" or "hdf5" is supported.

###########################################################
#         GENERATOR NETWORK ARCHITECTURE SETTING          #
###########################################################
generator_params:
    in_channels: 1        # Number of input channels.
    out_channels: 1       # Number of output channels.
    kernel_size: 5        # Kernel size of dilated convolution.
    layers: 30            # Number of residual block layers.
    stacks: 3             # Number of stacks i.e., dilation cycles.
    residual_channels: 64 # Number of channels in residual conv.
    gate_channels: 128    # Number of channels in gated conv.
    skip_channels: 64     # Number of channels in skip conv.
    aux_channels: 80      # Number of channels for auxiliary feature conv.
                          # Must be the same as num_mels.
    aux_context_window: 2 # Context window size for auxiliary feature.
                          # If set to 2, previous 2 and future 2 frames will be considered.
    dropout: 0.0          # Dropout rate. 0.0 means no dropout applied.
    use_weight_norm: true # Whether to use weight norm.
                          # If set to true, it will be applied to all of the conv layers.
    upsample_net: "ConvInUpsampleNetwork" # Upsampling network architecture.
    upsample_params:                      # Upsampling network parameters.
        upsample_scales: [4, 4, 5, 2]     # Upsampling scales. Prodcut of these must be the same as hop size.

###########################################################
#       DISCRIMINATOR NETWORK ARCHITECTURE SETTING        #
###########################################################
discriminator_type: "MelGANMultiScaleDiscriminator" # Discriminator type.
discriminator_params:
    in_channels: 1                    # Number of input channels.
    out_channels: 1                   # Number of output channels.
    scales: 3                         # Number of multi-scales.
    downsample_pooling: "AvgPool1d"   # Pooling type for the input downsampling.
    downsample_pooling_params:        # Parameters of the above pooling function.
        kernel_size: 4
        stride: 2
        padding: 1
        count_include_pad: False
    kernel_sizes: [5, 3]              # List of kernel size.
    channels: 16                      # Number of channels of the initial conv layer.
    max_downsample_channels: 1024     # Maximum number of channels of downsampling layers.
    downsample_scales: [4, 4, 4, 4]   # List of downsampling scales.
    nonlinear_activation: "LeakyReLU" # Nonlinear activation function.
    nonlinear_activation_params:      # Parameters of nonlinear activation function.
        negative_slope: 0.2
    use_weight_norm: True             # Whether to use weight norm.

###########################################################
#                   STFT LOSS SETTING                     #
###########################################################
stft_loss_params:
    fft_sizes: [1024, 2048, 512]  # List of FFT size for STFT-based loss.
    hop_sizes: [120, 240, 50]     # List of hop size for STFT-based loss
    win_lengths: [600, 1200, 240] # List of window length for STFT-based loss.
    window: "hann_window"         # Window function for STFT-based loss

###########################################################
#               ADVERSARIAL LOSS SETTING                  #
###########################################################
use_feat_match_loss: true # Whether to use feature matching loss.
lambda_feat_match: 25.0   # Loss balancing coefficient for feature matching loss.
lambda_adv: 4.0          # Loss balancing coefficient for adversarial loss.

###########################################################
#                  DATA LOADER SETTING                    #
###########################################################
batch_size: 8             # Batch size.
batch_max_steps: 8192      # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true           # Whether to pin memory in Pytorch DataLoader.
num_workers: 2             # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true          # Whether to allow cache in dataset. If true, it requires cpu memory.

###########################################################
#             OPTIMIZER & SCHEDULER SETTING               #
###########################################################
generator_optimizer_params:
    lr: 0.0001             # Generator's learning rate.
    eps: 1.0e-6            # Generator's epsilon.
    weight_decay: 0.0      # Generator's weight decay coefficient.
generator_scheduler_params:
    step_size: 3000000     # Generator's scheduler step size.
    gamma: 0.5             # Generator's scheduler gamma.
                           # At each step size, lr will be multiplied by this parameter.
generator_grad_norm: 10    # Generator's gradient norm.
discriminator_optimizer_params:
    lr: 0.00005            # Discriminator's learning rate.
    eps: 1.0e-6            # Discriminator's epsilon.
    weight_decay: 0.0      # Discriminator's weight decay coefficient.
discriminator_scheduler_params:
    step_size: 3000000     # Discriminator's scheduler step size.
    gamma: 0.5             # Discriminator's scheduler gamma.
                           # At each step size, lr will be multiplied by this parameter.
discriminator_grad_norm: 1 # Discriminator's gradient norm.

###########################################################
#                    INTERVAL SETTING                     #
###########################################################
discriminator_train_start_steps: 100000 # Number of steps to start to train discriminator.
train_max_steps: 3000000                # Number of training steps.
save_interval_steps: 5000               # Interval steps to save checkpoint.
eval_interval_steps: 1000               # Interval steps to evaluate the network.
log_interval_steps: 100                 # Interval steps to record the training log.

###########################################################
#                     OTHER SETTING                       #
###########################################################
num_save_intermediate_results: 4  # Number of results to be saved as intermediate results.

FILE : train.log

[train]:   0%|          | 1974/3000000 [08:35<218:49:50,  3.81it/s]
[train]:   0%|          | 1975/3000000 [08:35<217:33:15,  3.83it/s]
[train]:   0%|          | 1976/3000000 [08:35<215:58:02,  3.86it/s]
[train]:   0%|          | 1977/3000000 [08:35<212:03:47,  3.93it/s]
[train]:   0%|          | 1978/3000000 [08:36<206:58:43,  4.02it/s]
[train]:   0%|          | 1979/3000000 [08:36<206:36:10,  4.03it/s]
[train]:   0%|          | 1980/3000000 [08:36<203:47:04,  4.09it/s]
[train]:   0%|          | 1981/3000000 [08:36<202:49:16,  4.11it/s]
[train]:   0%|          | 1982/3000000 [08:37<201:48:36,  4.13it/s]
[train]:   0%|          | 1983/3000000 [08:37<204:30:29,  4.07it/s]
[train]:   0%|          | 1984/3000000 [08:37<214:55:21,  3.87it/s]
[train]:   0%|          | 1985/3000000 [08:37<215:13:26,  3.87it/s]
[train]:   0%|          | 1986/3000000 [08:38<212:16:55,  3.92it/s]
[train]:   0%|          | 1987/3000000 [08:38<213:19:01,  3.90it/s]
[train]:   0%|          | 1988/3000000 [08:38<213:14:45,  3.91it/s]
[train]:   0%|          | 1989/3000000 [08:38<219:15:55,  3.80it/s]
[train]:   0%|          | 1990/3000000 [08:39<222:03:54,  3.75it/s]
[train]:   0%|          | 1991/3000000 [08:39<215:29:35,  3.86it/s]
[train]:   0%|          | 1992/3000000 [08:39<214:56:20,  3.87it/s]
[train]:   0%|          | 1993/3000000 [08:39<217:19:08,  3.83it/s]
[train]:   0%|          | 1994/3000000 [08:40<213:44:15,  3.90it/s]
[train]:   0%|          | 1995/3000000 [08:40<217:45:49,  3.82it/s]
[train]:   0%|          | 1996/3000000 [08:40<211:25:31,  3.94it/s]
[train]:   0%|          | 1997/3000000 [08:41<213:41:09,  3.90it/s]
[train]:   0%|          | 1998/3000000 [08:41<220:40:00,  3.77it/s]
[train]:   0%|          | 1999/3000000 [08:41<213:48:09,  3.90it/s]
[train]:   0%|          | 2000/3000000 [08:41<207:16:31,  4.02it/s]2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/spectral_convergence_loss = 0.8977.
2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/log_stft_magnitude_loss = 1.1274.
2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/generator_loss = 2.0250.
2024-03-01 20:19:54,691 (train:471) INFO: (Steps: 2000) Start evaluation.

[eval]:   0%|          | 0/63 [00:00<?, ?it/s]

[eval]:   2%|▏         | 1/63 [00:01<01:23,  1.35s/it]

[eval]:   3%|▎         | 2/63 [00:01<00:39,  1.56it/s]

[eval]:   6%|▋         | 4/63 [00:01<00:17,  3.39it/s]

[eval]:  10%|▉         | 6/63 [00:01<00:11,  5.04it/s]

[eval]:  11%|█         | 7/63 [00:01<00:09,  5.70it/s]

[eval]:  14%|█▍        | 9/63 [00:02<00:07,  7.04it/s]

[eval]:  17%|█▋        | 11/63 [00:02<00:06,  8.11it/s]

[eval]:  21%|██        | 13/63 [00:02<00:05,  8.76it/s]

[eval]:  24%|██▍       | 15/63 [00:02<00:05,  9.36it/s]

[eval]:  27%|██▋       | 17/63 [00:02<00:04,  9.80it/s]

[eval]:  30%|███       | 19/63 [00:03<00:04, 10.12it/s]

[eval]:  33%|███▎      | 21/63 [00:03<00:04, 10.38it/s]

[eval]:  37%|███▋      | 23/63 [00:03<00:03, 10.38it/s]

[eval]:  40%|███▉      | 25/63 [00:03<00:03, 10.59it/s]

[eval]:  43%|████▎     | 27/63 [00:03<00:03, 10.71it/s]

[eval]:  46%|████▌     | 29/63 [00:03<00:03, 10.85it/s]

[eval]:  49%|████▉     | 31/63 [00:04<00:02, 10.88it/s]

[eval]:  52%|█████▏    | 33/63 [00:04<00:02, 10.85it/s]

[eval]:  56%|█████▌    | 35/63 [00:04<00:02, 10.78it/s]

[eval]:  59%|█████▊    | 37/63 [00:04<00:02, 10.85it/s]

[eval]:  62%|██████▏   | 39/63 [00:04<00:02, 10.92it/s]

[eval]:  65%|██████▌   | 41/63 [00:05<00:02, 10.69it/s]

[eval]:  68%|██████▊   | 43/63 [00:05<00:01, 10.71it/s]

[eval]:  71%|███████▏  | 45/63 [00:05<00:01, 10.82it/s]

[eval]:  75%|███████▍  | 47/63 [00:05<00:01, 10.81it/s]

[eval]:  78%|███████▊  | 49/63 [00:05<00:01, 10.88it/s]

[eval]:  81%|████████  | 51/63 [00:06<00:01, 10.80it/s]

[eval]:  84%|████████▍ | 53/63 [00:06<00:00, 10.81it/s]

[eval]:  87%|████████▋ | 55/63 [00:06<00:00, 10.45it/s]

[eval]:  90%|█████████ | 57/63 [00:06<00:00, 10.39it/s]

[eval]:  94%|█████████▎| 59/63 [00:06<00:00,  9.87it/s]

[eval]:  95%|█████████▌| 60/63 [00:06<00:00,  9.88it/s]

[eval]:  97%|█████████▋| 61/63 [00:07<00:00,  9.80it/s]

[eval]: 100%|██████████| 63/63 [00:07<00:00, 10.21it/s]
[eval]: 100%|██████████| 63/63 [00:07<00:00,  8.59it/s]2024-03-01 20:20:02,036 (train:487) INFO: (Steps: 2000) Finished evaluation (63 steps per epoch).
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/spectral_convergence_loss = 0.8968.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/log_stft_magnitude_loss = 1.0782.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/feature_matching_loss = 0.0003.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/adversarial_loss = 0.9558.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/generator_loss = 5.8318.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/real_loss = 0.9558.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/fake_loss = 0.0016.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/discriminator_loss = 0.9574.

[train]:   0%|          | 2001/3000000 [08:49<2045:52:46,  2.46s/it]
[train]:   0%|          | 2002/3000000 [08:49<1492:52:09,  1.79s/it]
[train]:   0%|          | 2003/3000000 [08:49<1104:55:45,  1.33s/it]
[train]:   0%|          | 2004/3000000 [08:50<841:28:14,  1.01s/it] 
[train]:   0%|          | 2005/3000000 [08:50<659:00:50,  1.26it/s]
[train]:   0%|          | 2006/3000000 [08:50<529:23:03,  1.57it/s]
[train]:   0%|          | 2007/3000000 [08:50<430:39:02,  1.93it/s]
[train]:   0%|          | 2008/3000000 [08:51<360:29:24,  2.31it/s]
[train]:   0%|          | 2009/3000000 [08:51<310:20:24,  2.68it/s]
[train]:   0%|          | 2010/3000000 [08:51<277:53:02,  3.00it/s]
[train]:   0%|          | 2011/3000000 [08:51<257:15:52,  3.24it/s]
[train]:   0%|          | 2012/3000000 [08:52<238:30:17,  3.49it/s]
[train]:   0%|          | 2013/3000000 [08:52<225:34:05,  3.69it/s]
[train]:   0%|          | 2014/3000000 [08:52<217:15:40,  3.83it/s]
[train]:   0%|          | 2015/3000000 [08:52<211:46:30,  3.93it/s]
[train]:   0%|          | 2016/3000000 [08:53<212:33:52,  3.92it/s]
[train]:   0%|          | 2017/3000000 [08:53<212:50:07,  3.91it/s]
[train]:   0%|          | 2018/3000000 [08:53<210:23:27,  3.96it/s]
[train]:   0%|          | 2019/3000000 [08:53<205:40:27,  4.05it/s]
[train]:   0%|          | 2020/3000000 [08:54<211:00:50,  3.95it/s]
[train]:   0%|          | 2021/3000000 [08:54<215:25:32,  3.87it/s]
[train]:   0%|          | 2022/3000000 [08:54<209:48:52,  3.97it/s]
[train]:   0%|          | 2023/3000000 [08:54<216:37:54,  3.84it/s]
[train]:   0%|          | 2024/3000000 [08:55<215:56:18,  3.86it/s]
[train]:   0%|          | 2025/3000000 [08:55<209:49:27,  3.97it/s]
[train]:   0%|          | 2026/3000000 [08:55<209:03:39,  3.98it/s]
[train]:   0%|          | 2027/3000000 [08:55<206:11:32,  4.04it/s]
[train]:   0%|          | 2028/3000000 [08:56<208:34:06,  3.99it/s]
[train]:   0%|          | 2029/3000000 [08:56<205:30:16,  4.05it/s]
[train]:   0%|          | 2030/3000000 [08:56<203:14:31,  4.10it/s]
[train]:   0%|          | 2031/3000000 [08:56<202:01:14,  4.12it/s]
[train]:   0%|          | 2032/3000000 [08:57<199:07:56,  4.18it/s]
[train]:   0%|          | 2033/3000000 [08:57<196:14:00,  4.24it/s]
[train]:   0%|          | 2034/3000000 [08:57<194:46:42,  4.28it/s]
[train]:   0%|          | 2035/3000000 [08:57<195:26:53,  4.26it/s]
[train]:   0%|          | 2036/3000000 [08:57<195:11:52,  4.27it/s]
[train]:   0%|          | 2037/3000000 [08:58<194:58:45,  4.27it/s]
[train]:   0%|          | 2038/3000000 [08:58<199:45:30,  4.17it/s]
[train]:   0%|          | 2039/3000000 [08:58<197:02:17,  4.23it/s]
[train]:   0%|          | 2040/3000000 [08:58<195:42:53,  4.25it/s]
[train]:   0%|          | 2041/3000000 [08:59<194:21:01,  4.28it/s]
[train]:   0%|          | 2042/3000000 [08:59<198:10:37,  4.20it/s]
[train]:   0%|          | 2043/3000000 [08:59<201:18:58,  4.14it/s]
[train]:   0%|          | 2044/3000000 [08:59<201:10:17,  4.14it/s]
[train]:   0%|          | 2045/3000000 [09:00<220:58:42,  3.77it/s]  # Start of the problem section.
[train]:   0%|          | 2046/3000000 [09:02<681:41:20,  1.22it/s]
[train]:   0%|          | 2047/3000000 [09:10<2471:23:38,  2.97s/it]
[train]:   0%|          | 2048/3000000 [10:51<27101:07:28, 32.54s/it]
[train]:   0%|          | 2049/3000000 [13:04<52114:33:59, 62.58s/it]2024-03-01 20:32:22,966 (train:1546) INFO: Successfully saved checkpoint @ 2049steps.
Traceback (most recent call last):
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-train", line 33, in <module>
    sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')())
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 1541, in main
    trainer.run()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 105, in run
    self._train_epoch()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 346, in _train_epoch
    self._train_step(batch)
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 288, in _train_step
    gen_loss.backward()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
KeyboardInterrupt

[train]:   0%|          | 2049/3000000 [21:11<516:45:57,  1.61it/s]  

Since then, I have been waiting for more than 6 hours, but only 5 epoch progressed.

kan-bayashi commented 6 months ago

This is not intended. Did you check CPU memory usage? In the config, allow_cache: true will store all data in CPU memory. Therefore, if the CPU memory is full, the cache will be something problem (e.g., using too much swap instead cpu memory). This may affect the training speed.

Cyp9715 commented 6 months ago

I think what you are concerned about is correct. It is currently operating normally. Resolved by significantly increasing the virtual memory swap size.

Thank you for your response.