Closed nellorebhanuteja closed 2 years ago
May I know how much time it took for you ?
You can check the log. It shows remaining time.
And please carefully check this part.
In the case of distributed training, the batch size will be automatically multiplied by the number of gpus. Please be careful.
So if you run the config without modification using multi-gpu, it does not accelerate the training time since we use iteration-based training. If you want to accelerate, you need to decrease batch size to original batch size / #gpus. Then, 4 V100 should finish within 1 week.
thanks for the reply @kan-bayashi
i have followed the advice given by you i use 4 gpus i reduced batch_size from 16 to 4
However, the train time doesn't seem to have reduced
This following is my config file for reference
allow_cache: true
batch_max_steps: 8192
batch_size: 4
config: conf/hifigan.v1.yaml
dev_dumpdir: dump/dev/norm
dev_feats_scp: null
dev_segments: null
dev_wav_scp: null
discriminator_adv_loss_params:
average_by_discriminators: false
discriminator_grad_norm: -1
discriminator_optimizer_params:
betas:
- 0.5
- 0.9
lr: 0.0002
weight_decay: 0.0
discriminator_optimizer_type: Adam
discriminator_params:
follow_official_norm: true
period_discriminator_params:
bias: true
channels: 32
downsample_scales:
- 3
- 3
- 3
- 3
- 1
in_channels: 1
kernel_sizes:
- 5
- 3
max_downsample_channels: 1024
nonlinear_activation: LeakyReLU
nonlinear_activation_params:
negative_slope: 0.1
out_channels: 1
use_spectral_norm: false
use_weight_norm: true
periods:
- 2
- 3
- 5
- 7
- 11
scale_discriminator_params:
bias: true
channels: 128
downsample_scales:
- 4
- 4
- 4
- 4
- 1
in_channels: 1
kernel_sizes:
- 15
- 41
- 5
- 3
max_downsample_channels: 1024
max_groups: 16
nonlinear_activation: LeakyReLU
nonlinear_activation_params:
negative_slope: 0.1
out_channels: 1
scale_downsample_pooling: AvgPool1d
scale_downsample_pooling_params:
kernel_size: 4
padding: 2
stride: 2
scales: 3
discriminator_scheduler_params:
gamma: 0.5
milestones:
- 200000
- 400000
- 600000
- 800000
discriminator_scheduler_type: MultiStepLR
discriminator_train_start_steps: 0
discriminator_type: HiFiGANMultiScaleMultiPeriodDiscriminator
distributed: true
eval_interval_steps: 1000
feat_match_loss_params:
average_by_discriminators: false
average_by_layers: false
include_final_outputs: false
fft_size: 1024
fmax: 7600
fmin: 80
format: hdf5
generator_adv_loss_params:
average_by_discriminators: false
generator_grad_norm: -1
generator_optimizer_params:
betas:
- 0.5
- 0.9
lr: 0.0002
weight_decay: 0.0
generator_optimizer_type: Adam
generator_params:
bias: true
channels: 512
in_channels: 80
kernel_size: 7
nonlinear_activation: LeakyReLU
nonlinear_activation_params:
negative_slope: 0.1
out_channels: 1
resblock_dilations:
- - 1
- 3
- 5
- - 1
- 3
- 5
- - 1
- 3
- 5
resblock_kernel_sizes:
- 3
- 7
- 11
upsample_kernel_sizes:
- 16
- 16
- 4
- 4
upsample_scales:
- 8
- 8
- 2
- 2
use_additional_convs: true
use_weight_norm: true
generator_scheduler_params:
gamma: 0.5
milestones:
- 200000
- 400000
- 600000
- 800000
generator_scheduler_type: MultiStepLR
generator_train_start_steps: 1
generator_type: HiFiGANGenerator
global_gain_scale: 1.0
hop_size: 256
lambda_adv: 1.0
lambda_aux: 45.0
lambda_feat_match: 2.0
log_interval_steps: 100
mel_loss_params:
fft_size: 1024
fmax: 11025
fmin: 0
fs: 22050
hop_size: 256
log_base: null
num_mels: 80
win_length: null
window: hann
num_mels: 80
num_save_intermediate_results: 4
num_workers: 2
outdir: exp/train_nodev_ljspeech_hifigan.v1
pin_memory: true
pretrain: ''
rank: 3
remove_short_samples: false
resume: ''
sampling_rate: 22050
save_interval_steps: 10000
train_dumpdir: dump/train_nodev/norm
train_feats_scp: null
train_max_steps: 2500000
train_segments: null
train_wav_scp: null
trim_frame_size: 1024
trim_hop_size: 256
trim_silence: false
trim_threshold_in_db: 20
use_feat_match_loss: true
use_mel_loss: true
use_stft_loss: false
verbose: 1
version: 0.5.5
win_length: null
window: hann
world_size: 4
@kan-bayashi can you please answer this question ?
Could you give me the logs of both cases you compared?
logs, as in, should i attach the train.log of both cases ?
Yes, please.
i have attached log file that was generated after reducing batch size unfortunately, i have not preserved train.log that was generated before in the other case.
We need to compare with the original setting to check the speed.
[train]: 0%| | 1390/2500000 [13:20<393:16:14, 1.76it/s]
You can launch the config with batch size 16 and ngpu 1 and check 1.76it/s
part.
ok i ran with the original config on 1gpu this is the log
[train]: 0%| | 26/2500000 [00:28<650:33:48, 1.07it/s]
It seems the training speed is increasing (about x1.6), what is your problem?
in both cases training is taking 4 weeks to run (estimatedly)
i used joint model in espnet-tts, and was expecting vocoder training to take about 1 week
[train]: 0%| | 14/2500000 [00:12<406:46:23, 1.71it/s]
Your first log shows estimated required time is 2 weeks+.
i used joint model in espnet-tts, and was expecting vocoder training to take about 1 week
The number of iterations is totally different.
For example https://github.com/espnet/espnet/blob/f274ebed88e3b4820b23ea71fb7f9f6d56706be9/egs2/ljspeech/tts1/conf/tuning/train_joint_conformer_fastspeech2_hifigan.yaml#L196-L197 This joint model training is 1000 x 1000 = 1M steps while HiFiGAN original trining is 2.5M steps.
right so, you say vocoder training requires more steps that joint model training ?
I stopped at 1M iterations since the generated voice is enough quality and 2.5M takes too long time with my limited GPU resources. If you have enough GPU resources, it is worthwhile to try longer training.
ok thanks a lot for patiently answering!
Hi
I am training HiFIGAN vocoder on LJSpeech, using the recipe provided . Its been running since more than a week.
I am using 4 Tesla GPUs with 32 GB memory
May I know how much time it took for you ?
@kan-bayashi