The learning rate decreases after a specific epoch.

From approximately 2000 epoch, it is learned at a fast rate of 5 epoch per second based on RTX3070, and then suddenly the speed drops sharply. I'm train parallel-waveGAN, and I've modified some of the datasets and conf files to replace them with other languages. Starting with 2000 epoch, I start reading SSD like crazy instead of GPU resources. I waited for more than 6 hours, but only five epoch progressed.

Is this intended? Thank you.

Below is the information about the setting file that I modified.

FILE : conf/parallel_wavegan.v3.yaml

###########################################################
#                FEATURE EXTRACTION SETTING               #
###########################################################
sampling_rate: 16000     # Sampling rate.
fft_size: 400           # FFT size.
hop_size: 160            # Hop size.
win_length: 400         # Window length.
                         # If set to null, it will be the same as fft_size.
window: "hann"           # Window function.
num_mels: 80             # Number of mel basis.
fmin: 80                 # Minimum freq in mel basis calculation.
fmax: 7600               # Maximum frequency in mel basis calculation.
global_gain_scale: 1.0   # Will be multiplied to all of waveform.
trim_silence: true       # Whether to trim the start and end of silence.
trim_threshold_in_db: 60 # Need to tune carefully if the recording is not good.
trim_frame_size: 2048    # Frame size in trimming.
trim_hop_size: 512       # Hop size in trimming.
format: "hdf5"           # Feature file format. "npy" or "hdf5" is supported.

###########################################################
#         GENERATOR NETWORK ARCHITECTURE SETTING          #
###########################################################
generator_params:
    in_channels: 1        # Number of input channels.
    out_channels: 1       # Number of output channels.
    kernel_size: 5        # Kernel size of dilated convolution.
    layers: 30            # Number of residual block layers.
    stacks: 3             # Number of stacks i.e., dilation cycles.
    residual_channels: 64 # Number of channels in residual conv.
    gate_channels: 128    # Number of channels in gated conv.
    skip_channels: 64     # Number of channels in skip conv.
    aux_channels: 80      # Number of channels for auxiliary feature conv.
                          # Must be the same as num_mels.
    aux_context_window: 2 # Context window size for auxiliary feature.
                          # If set to 2, previous 2 and future 2 frames will be considered.
    dropout: 0.0          # Dropout rate. 0.0 means no dropout applied.
    use_weight_norm: true # Whether to use weight norm.
                          # If set to true, it will be applied to all of the conv layers.
    upsample_net: "ConvInUpsampleNetwork" # Upsampling network architecture.
    upsample_params:                      # Upsampling network parameters.
        upsample_scales: [4, 4, 5, 2]     # Upsampling scales. Prodcut of these must be the same as hop size.

###########################################################
#       DISCRIMINATOR NETWORK ARCHITECTURE SETTING        #
###########################################################
discriminator_type: "MelGANMultiScaleDiscriminator" # Discriminator type.
discriminator_params:
    in_channels: 1                    # Number of input channels.
    out_channels: 1                   # Number of output channels.
    scales: 3                         # Number of multi-scales.
    downsample_pooling: "AvgPool1d"   # Pooling type for the input downsampling.
    downsample_pooling_params:        # Parameters of the above pooling function.
        kernel_size: 4
        stride: 2
        padding: 1
        count_include_pad: False
    kernel_sizes: [5, 3]              # List of kernel size.
    channels: 16                      # Number of channels of the initial conv layer.
    max_downsample_channels: 1024     # Maximum number of channels of downsampling layers.
    downsample_scales: [4, 4, 4, 4]   # List of downsampling scales.
    nonlinear_activation: "LeakyReLU" # Nonlinear activation function.
    nonlinear_activation_params:      # Parameters of nonlinear activation function.
        negative_slope: 0.2
    use_weight_norm: True             # Whether to use weight norm.

###########################################################
#                   STFT LOSS SETTING                     #
###########################################################
stft_loss_params:
    fft_sizes: [1024, 2048, 512]  # List of FFT size for STFT-based loss.
    hop_sizes: [120, 240, 50]     # List of hop size for STFT-based loss
    win_lengths: [600, 1200, 240] # List of window length for STFT-based loss.
    window: "hann_window"         # Window function for STFT-based loss

###########################################################
#               ADVERSARIAL LOSS SETTING                  #
###########################################################
use_feat_match_loss: true # Whether to use feature matching loss.
lambda_feat_match: 25.0   # Loss balancing coefficient for feature matching loss.
lambda_adv: 4.0          # Loss balancing coefficient for adversarial loss.

###########################################################
#                  DATA LOADER SETTING                    #
###########################################################
batch_size: 8             # Batch size.
batch_max_steps: 8192      # Length of each audio in batch. Make sure dividable by hop_size.
pin_memory: true           # Whether to pin memory in Pytorch DataLoader.
num_workers: 2             # Number of workers in Pytorch DataLoader.
remove_short_samples: true # Whether to remove samples the length of which are less than batch_max_steps.
allow_cache: true          # Whether to allow cache in dataset. If true, it requires cpu memory.

###########################################################
#             OPTIMIZER & SCHEDULER SETTING               #
###########################################################
generator_optimizer_params:
    lr: 0.0001             # Generator's learning rate.
    eps: 1.0e-6            # Generator's epsilon.
    weight_decay: 0.0      # Generator's weight decay coefficient.
generator_scheduler_params:
    step_size: 3000000     # Generator's scheduler step size.
    gamma: 0.5             # Generator's scheduler gamma.
                           # At each step size, lr will be multiplied by this parameter.
generator_grad_norm: 10    # Generator's gradient norm.
discriminator_optimizer_params:
    lr: 0.00005            # Discriminator's learning rate.
    eps: 1.0e-6            # Discriminator's epsilon.
    weight_decay: 0.0      # Discriminator's weight decay coefficient.
discriminator_scheduler_params:
    step_size: 3000000     # Discriminator's scheduler step size.
    gamma: 0.5             # Discriminator's scheduler gamma.
                           # At each step size, lr will be multiplied by this parameter.
discriminator_grad_norm: 1 # Discriminator's gradient norm.

###########################################################
#                    INTERVAL SETTING                     #
###########################################################
discriminator_train_start_steps: 100000 # Number of steps to start to train discriminator.
train_max_steps: 3000000                # Number of training steps.
save_interval_steps: 5000               # Interval steps to save checkpoint.
eval_interval_steps: 1000               # Interval steps to evaluate the network.
log_interval_steps: 100                 # Interval steps to record the training log.

###########################################################
#                     OTHER SETTING                       #
###########################################################
num_save_intermediate_results: 4  # Number of results to be saved as intermediate results.

FILE : train.log

[train]:   0%|          | 1974/3000000 [08:35<218:49:50,  3.81it/s]
[train]:   0%|          | 1975/3000000 [08:35<217:33:15,  3.83it/s]
[train]:   0%|          | 1976/3000000 [08:35<215:58:02,  3.86it/s]
[train]:   0%|          | 1977/3000000 [08:35<212:03:47,  3.93it/s]
[train]:   0%|          | 1978/3000000 [08:36<206:58:43,  4.02it/s]
[train]:   0%|          | 1979/3000000 [08:36<206:36:10,  4.03it/s]
[train]:   0%|          | 1980/3000000 [08:36<203:47:04,  4.09it/s]
[train]:   0%|          | 1981/3000000 [08:36<202:49:16,  4.11it/s]
[train]:   0%|          | 1982/3000000 [08:37<201:48:36,  4.13it/s]
[train]:   0%|          | 1983/3000000 [08:37<204:30:29,  4.07it/s]
[train]:   0%|          | 1984/3000000 [08:37<214:55:21,  3.87it/s]
[train]:   0%|          | 1985/3000000 [08:37<215:13:26,  3.87it/s]
[train]:   0%|          | 1986/3000000 [08:38<212:16:55,  3.92it/s]
[train]:   0%|          | 1987/3000000 [08:38<213:19:01,  3.90it/s]
[train]:   0%|          | 1988/3000000 [08:38<213:14:45,  3.91it/s]
[train]:   0%|          | 1989/3000000 [08:38<219:15:55,  3.80it/s]
[train]:   0%|          | 1990/3000000 [08:39<222:03:54,  3.75it/s]
[train]:   0%|          | 1991/3000000 [08:39<215:29:35,  3.86it/s]
[train]:   0%|          | 1992/3000000 [08:39<214:56:20,  3.87it/s]
[train]:   0%|          | 1993/3000000 [08:39<217:19:08,  3.83it/s]
[train]:   0%|          | 1994/3000000 [08:40<213:44:15,  3.90it/s]
[train]:   0%|          | 1995/3000000 [08:40<217:45:49,  3.82it/s]
[train]:   0%|          | 1996/3000000 [08:40<211:25:31,  3.94it/s]
[train]:   0%|          | 1997/3000000 [08:41<213:41:09,  3.90it/s]
[train]:   0%|          | 1998/3000000 [08:41<220:40:00,  3.77it/s]
[train]:   0%|          | 1999/3000000 [08:41<213:48:09,  3.90it/s]
[train]:   0%|          | 2000/3000000 [08:41<207:16:31,  4.02it/s]2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/spectral_convergence_loss = 0.8977.
2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/log_stft_magnitude_loss = 1.1274.
2024-03-01 20:19:54,678 (train:633) INFO: (Steps: 2000) train/generator_loss = 2.0250.
2024-03-01 20:19:54,691 (train:471) INFO: (Steps: 2000) Start evaluation.

[eval]:   0%|          | 0/63 [00:00<?, ?it/s][A

[eval]:   2%|▏         | 1/63 [00:01<01:23,  1.35s/it][A

[eval]:   3%|▎         | 2/63 [00:01<00:39,  1.56it/s][A

[eval]:   6%|▋         | 4/63 [00:01<00:17,  3.39it/s][A

[eval]:  10%|▉         | 6/63 [00:01<00:11,  5.04it/s][A

[eval]:  11%|█         | 7/63 [00:01<00:09,  5.70it/s][A

[eval]:  14%|█▍        | 9/63 [00:02<00:07,  7.04it/s][A

[eval]:  17%|█▋        | 11/63 [00:02<00:06,  8.11it/s][A

[eval]:  21%|██        | 13/63 [00:02<00:05,  8.76it/s][A

[eval]:  24%|██▍       | 15/63 [00:02<00:05,  9.36it/s][A

[eval]:  27%|██▋       | 17/63 [00:02<00:04,  9.80it/s][A

[eval]:  30%|███       | 19/63 [00:03<00:04, 10.12it/s][A

[eval]:  33%|███▎      | 21/63 [00:03<00:04, 10.38it/s][A

[eval]:  37%|███▋      | 23/63 [00:03<00:03, 10.38it/s][A

[eval]:  40%|███▉      | 25/63 [00:03<00:03, 10.59it/s][A

[eval]:  43%|████▎     | 27/63 [00:03<00:03, 10.71it/s][A

[eval]:  46%|████▌     | 29/63 [00:03<00:03, 10.85it/s][A

[eval]:  49%|████▉     | 31/63 [00:04<00:02, 10.88it/s][A

[eval]:  52%|█████▏    | 33/63 [00:04<00:02, 10.85it/s][A

[eval]:  56%|█████▌    | 35/63 [00:04<00:02, 10.78it/s][A

[eval]:  59%|█████▊    | 37/63 [00:04<00:02, 10.85it/s][A

[eval]:  62%|██████▏   | 39/63 [00:04<00:02, 10.92it/s][A

[eval]:  65%|██████▌   | 41/63 [00:05<00:02, 10.69it/s][A

[eval]:  68%|██████▊   | 43/63 [00:05<00:01, 10.71it/s][A

[eval]:  71%|███████▏  | 45/63 [00:05<00:01, 10.82it/s][A

[eval]:  75%|███████▍  | 47/63 [00:05<00:01, 10.81it/s][A

[eval]:  78%|███████▊  | 49/63 [00:05<00:01, 10.88it/s][A

[eval]:  81%|████████  | 51/63 [00:06<00:01, 10.80it/s][A

[eval]:  84%|████████▍ | 53/63 [00:06<00:00, 10.81it/s][A

[eval]:  87%|████████▋ | 55/63 [00:06<00:00, 10.45it/s][A

[eval]:  90%|█████████ | 57/63 [00:06<00:00, 10.39it/s][A

[eval]:  94%|█████████▎| 59/63 [00:06<00:00,  9.87it/s][A

[eval]:  95%|█████████▌| 60/63 [00:06<00:00,  9.88it/s][A

[eval]:  97%|█████████▋| 61/63 [00:07<00:00,  9.80it/s][A

[eval]: 100%|██████████| 63/63 [00:07<00:00, 10.21it/s][A
[eval]: 100%|██████████| 63/63 [00:07<00:00,  8.59it/s]2024-03-01 20:20:02,036 (train:487) INFO: (Steps: 2000) Finished evaluation (63 steps per epoch).
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/spectral_convergence_loss = 0.8968.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/log_stft_magnitude_loss = 1.0782.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/feature_matching_loss = 0.0003.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/adversarial_loss = 0.9558.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/generator_loss = 5.8318.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/real_loss = 0.9558.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/fake_loss = 0.0016.
2024-03-01 20:20:02,036 (train:495) INFO: (Steps: 2000) eval/discriminator_loss = 0.9574.

[train]:   0%|          | 2001/3000000 [08:49<2045:52:46,  2.46s/it]
[train]:   0%|          | 2002/3000000 [08:49<1492:52:09,  1.79s/it]
[train]:   0%|          | 2003/3000000 [08:49<1104:55:45,  1.33s/it]
[train]:   0%|          | 2004/3000000 [08:50<841:28:14,  1.01s/it] 
[train]:   0%|          | 2005/3000000 [08:50<659:00:50,  1.26it/s]
[train]:   0%|          | 2006/3000000 [08:50<529:23:03,  1.57it/s]
[train]:   0%|          | 2007/3000000 [08:50<430:39:02,  1.93it/s]
[train]:   0%|          | 2008/3000000 [08:51<360:29:24,  2.31it/s]
[train]:   0%|          | 2009/3000000 [08:51<310:20:24,  2.68it/s]
[train]:   0%|          | 2010/3000000 [08:51<277:53:02,  3.00it/s]
[train]:   0%|          | 2011/3000000 [08:51<257:15:52,  3.24it/s]
[train]:   0%|          | 2012/3000000 [08:52<238:30:17,  3.49it/s]
[train]:   0%|          | 2013/3000000 [08:52<225:34:05,  3.69it/s]
[train]:   0%|          | 2014/3000000 [08:52<217:15:40,  3.83it/s]
[train]:   0%|          | 2015/3000000 [08:52<211:46:30,  3.93it/s]
[train]:   0%|          | 2016/3000000 [08:53<212:33:52,  3.92it/s]
[train]:   0%|          | 2017/3000000 [08:53<212:50:07,  3.91it/s]
[train]:   0%|          | 2018/3000000 [08:53<210:23:27,  3.96it/s]
[train]:   0%|          | 2019/3000000 [08:53<205:40:27,  4.05it/s]
[train]:   0%|          | 2020/3000000 [08:54<211:00:50,  3.95it/s]
[train]:   0%|          | 2021/3000000 [08:54<215:25:32,  3.87it/s]
[train]:   0%|          | 2022/3000000 [08:54<209:48:52,  3.97it/s]
[train]:   0%|          | 2023/3000000 [08:54<216:37:54,  3.84it/s]
[train]:   0%|          | 2024/3000000 [08:55<215:56:18,  3.86it/s]
[train]:   0%|          | 2025/3000000 [08:55<209:49:27,  3.97it/s]
[train]:   0%|          | 2026/3000000 [08:55<209:03:39,  3.98it/s]
[train]:   0%|          | 2027/3000000 [08:55<206:11:32,  4.04it/s]
[train]:   0%|          | 2028/3000000 [08:56<208:34:06,  3.99it/s]
[train]:   0%|          | 2029/3000000 [08:56<205:30:16,  4.05it/s]
[train]:   0%|          | 2030/3000000 [08:56<203:14:31,  4.10it/s]
[train]:   0%|          | 2031/3000000 [08:56<202:01:14,  4.12it/s]
[train]:   0%|          | 2032/3000000 [08:57<199:07:56,  4.18it/s]
[train]:   0%|          | 2033/3000000 [08:57<196:14:00,  4.24it/s]
[train]:   0%|          | 2034/3000000 [08:57<194:46:42,  4.28it/s]
[train]:   0%|          | 2035/3000000 [08:57<195:26:53,  4.26it/s]
[train]:   0%|          | 2036/3000000 [08:57<195:11:52,  4.27it/s]
[train]:   0%|          | 2037/3000000 [08:58<194:58:45,  4.27it/s]
[train]:   0%|          | 2038/3000000 [08:58<199:45:30,  4.17it/s]
[train]:   0%|          | 2039/3000000 [08:58<197:02:17,  4.23it/s]
[train]:   0%|          | 2040/3000000 [08:58<195:42:53,  4.25it/s]
[train]:   0%|          | 2041/3000000 [08:59<194:21:01,  4.28it/s]
[train]:   0%|          | 2042/3000000 [08:59<198:10:37,  4.20it/s]
[train]:   0%|          | 2043/3000000 [08:59<201:18:58,  4.14it/s]
[train]:   0%|          | 2044/3000000 [08:59<201:10:17,  4.14it/s]
[train]:   0%|          | 2045/3000000 [09:00<220:58:42,  3.77it/s]  # Start of the problem section.
[train]:   0%|          | 2046/3000000 [09:02<681:41:20,  1.22it/s]
[train]:   0%|          | 2047/3000000 [09:10<2471:23:38,  2.97s/it]
[train]:   0%|          | 2048/3000000 [10:51<27101:07:28, 32.54s/it]
[train]:   0%|          | 2049/3000000 [13:04<52114:33:59, 62.58s/it]2024-03-01 20:32:22,966 (train:1546) INFO: Successfully saved checkpoint @ 2049steps.
Traceback (most recent call last):
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-train", line 33, in <module>
    sys.exit(load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')())
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 1541, in main
    trainer.run()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 105, in run
    self._train_epoch()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 346, in _train_epoch
    self._train_step(batch)
  File "/home/EVDA/Vocoder/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 288, in _train_step
    gen_loss.backward()
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/EVDA/Vocoder/ParallelWaveGAN/tools/venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
KeyboardInterrupt

[train]:   0%|          | 2049/3000000 [21:11<516:45:57,  1.61it/s]

Since then, I have been waiting for more than 6 hours, but only 5 epoch progressed.

kan-bayashi / ParallelWaveGAN

The learning rate decreases after a specific epoch. #425