kan-bayashi / ParallelWaveGAN

Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
https://kan-bayashi.github.io/ParallelWaveGAN/
MIT License
1.57k stars 343 forks source link

train on my own dataset error #300

Closed anzhi998 closed 3 years ago

anzhi998 commented 3 years ago

Stage 0: Data preparation Successfully split data directory. Successfully split data directory. Successfully prepared data. Stage 1: Feature extraction Feature extraction start. See the progress via dump/dev/raw/preprocessing..log. Feature extraction start. See the progress via dump/eval/raw/preprocessing..log. Feature extraction start. See the progress via dump/train_nodev/raw/preprocessing..log. Successfully make subsets. Successfully make subsets. Successfully make subsets. run.pl: 4 / 4 failed, log is in dump/dev/raw/preprocessing..log run.pl: 4 / 4 failed, log is in dump/eval/raw/preprocessing..log run.pl: 4 / 4 failed, log is in dump/train_nodev/raw/preprocessing..log ./run.sh: 3 background jobs are failed.

how to deal with this error?

kan-bayashi commented 3 years ago

See dump/dev/raw/preprocessing.1.log.

anzhi998 commented 3 years ago

Traceback (most recent call last): File "/ssd/XJK/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-preprocess", line 11, in load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-preprocess')() File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/bin/preprocess.py", line 178, in main for utt_id, (audio, fs) in tqdm(dataset): File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/tqdm/std.py", line 1185, in iter for obj in iterable: File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/datasets/scp_dataset.py", line 244, in getitem fs, audio = self.audio_loader[utt_id] File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/utils.py", line 480, in getitem return self._loader(ark_name) File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 241, in load_mat return _load_mat(fd, offset, slices, endian=endian) File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 331, in _load_mat array = read_kaldi(fd, endian) File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 441, in read_kaldi array = read_ascii_mat(fd) File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 605, in read_ascii_mat assert len(string) != 0

kan-bayashi commented 3 years ago

Please attach all log.

anzhi998 commented 3 years ago
# parallel-wavegan-preprocess --config conf/parallel_wavegan.v1.yaml --scp dump/dev/raw/wav.1.scp --dumpdir dump/dev/raw/dump.1 --verbose 1 
# Started at 2021年 08月 26日 星期四 15:35:31 CST
#

  0%|          | 0/25 [00:00<?, ?it/s]/bin/sh: 1: sox: not found
/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/utils.py:482: UserWarning: An error happens at loading "cat /ssd/XJK/ParallelWaveGAN/cutted/f009118_cuted.wav | sox -t wav - -c 1 -b 16 -t wav - rate 22050 |"
  warnings.warn('An error happens at loading "{}"'.format(ark_name))

  0%|          | 0/25 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-preprocess", line 11, in <module>
    load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-preprocess')()
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/bin/preprocess.py", line 178, in main
    for utt_id, (audio, fs) in tqdm(dataset):
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/tqdm/std.py", line 1185, in __iter__
    for obj in iterable:
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/datasets/scp_dataset.py", line 244, in __getitem__
    fs, audio = self.audio_loader[utt_id]
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/utils.py", line 480, in __getitem__
    return self._loader(ark_name)
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 241, in load_mat
    return _load_mat(fd, offset, slices, endian=endian)
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 331, in _load_mat
    array = read_kaldi(fd, endian)
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 441, in read_kaldi
    array = read_ascii_mat(fd)
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/lib/python3.6/site-packages/kaldiio/matio.py", line 605, in read_ascii_mat
    assert len(string) != 0
AssertionError
# Accounting: time=2 threads=1
# Ended (code 1) at 2021年 08月 26日 星期四 15:35:33 CST, elapsed time 2 seconds

THANK YOU VERY MUCH!

kan-bayashi commented 3 years ago

0%| | 0/25 [00:00<?, ?it/s]/bin/sh: 1: sox: not found

sox is not installed. Please install it.

anzhi998 commented 3 years ago

thank you very much for your patience!

anzhi998 commented 3 years ago

sorry! excuse me ,another question in stage 2: here is the log file:

# parallel-wavegan-train --config conf/parallel_wavegan.v1.yaml --train-dumpdir dump/train_nodev/norm --dev-dumpdir dump/dev/norm --outdir exp/train_nodev_parallel_wavegan.v1 --resume "" --pretrain "" --verbose 1 
# Started at 2021年 08月 26日 星期四 17:16:09 CST
#
2021-08-26 17:16:10,461 (train:792) INFO: sampling_rate = 2000
2021-08-26 17:16:10,461 (train:792) INFO: fft_size = 512
2021-08-26 17:16:10,461 (train:792) INFO: hop_size = 156
2021-08-26 17:16:10,461 (train:792) INFO: win_length = None
2021-08-26 17:16:10,461 (train:792) INFO: window = hann
2021-08-26 17:16:10,461 (train:792) INFO: num_mels = 80
2021-08-26 17:16:10,461 (train:792) INFO: fmin = 0
2021-08-26 17:16:10,461 (train:792) INFO: fmax = 1000
2021-08-26 17:16:10,461 (train:792) INFO: global_gain_scale = 1.0
2021-08-26 17:16:10,461 (train:792) INFO: trim_silence = False
2021-08-26 17:16:10,461 (train:792) INFO: trim_threshold_in_db = 60
2021-08-26 17:16:10,461 (train:792) INFO: trim_frame_size = 2048
2021-08-26 17:16:10,461 (train:792) INFO: trim_hop_size = 512
2021-08-26 17:16:10,461 (train:792) INFO: format = hdf5
2021-08-26 17:16:10,461 (train:792) INFO: generator_params = {'in_channels': 1, 'out_channels': 1, 'kernel_size': 3, 'layers': 30, 'stacks': 3, 'residual_channels': 64, 'gate_channels': 128, 'skip_channels': 64, 'aux_channels': 80, 'aux_context_window': 2, 'dropout': 0.0, 'use_weight_norm': True, 'upsample_net': 'ConvInUpsampleNetwork', 'upsample_params': {'upsample_scales': [4, 4, 4, 4]}}
2021-08-26 17:16:10,461 (train:792) INFO: discriminator_params = {'in_channels': 1, 'out_channels': 1, 'kernel_size': 3, 'layers': 10, 'conv_channels': 64, 'bias': True, 'use_weight_norm': True, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.2}}
2021-08-26 17:16:10,461 (train:792) INFO: stft_loss_params = {'fft_sizes': [1024, 2048, 512], 'hop_sizes': [120, 240, 50], 'win_lengths': [600, 1200, 240], 'window': 'hann_window'}
2021-08-26 17:16:10,461 (train:792) INFO: lambda_adv = 4.0
2021-08-26 17:16:10,461 (train:792) INFO: batch_size = 6
2021-08-26 17:16:10,461 (train:792) INFO: batch_max_steps = 25600
2021-08-26 17:16:10,461 (train:792) INFO: pin_memory = True
2021-08-26 17:16:10,461 (train:792) INFO: num_workers = 2
2021-08-26 17:16:10,462 (train:792) INFO: remove_short_samples = True
2021-08-26 17:16:10,462 (train:792) INFO: allow_cache = True
2021-08-26 17:16:10,462 (train:792) INFO: generator_optimizer_params = {'lr': 0.0001, 'eps': 1e-06, 'weight_decay': 0.0}
2021-08-26 17:16:10,462 (train:792) INFO: generator_scheduler_params = {'step_size': 200000, 'gamma': 0.5}
2021-08-26 17:16:10,462 (train:792) INFO: generator_grad_norm = 10
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_optimizer_params = {'lr': 5e-05, 'eps': 1e-06, 'weight_decay': 0.0}
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_scheduler_params = {'step_size': 200000, 'gamma': 0.5}
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_grad_norm = 1
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_train_start_steps = 100000
2021-08-26 17:16:10,462 (train:792) INFO: train_max_steps = 400000
2021-08-26 17:16:10,462 (train:792) INFO: save_interval_steps = 5000
2021-08-26 17:16:10,462 (train:792) INFO: eval_interval_steps = 1000
2021-08-26 17:16:10,462 (train:792) INFO: log_interval_steps = 100
2021-08-26 17:16:10,462 (train:792) INFO: num_save_intermediate_results = 4
2021-08-26 17:16:10,462 (train:792) INFO: train_wav_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: train_feats_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: train_segments = None
2021-08-26 17:16:10,462 (train:792) INFO: train_dumpdir = dump/train_nodev/norm
2021-08-26 17:16:10,462 (train:792) INFO: dev_wav_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_feats_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_segments = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_dumpdir = dump/dev/norm
2021-08-26 17:16:10,462 (train:792) INFO: outdir = exp/train_nodev_parallel_wavegan.v1
2021-08-26 17:16:10,462 (train:792) INFO: config = conf/parallel_wavegan.v1.yaml
2021-08-26 17:16:10,462 (train:792) INFO: pretrain = 
2021-08-26 17:16:10,462 (train:792) INFO: resume = 
2021-08-26 17:16:10,462 (train:792) INFO: verbose = 1
2021-08-26 17:16:10,462 (train:792) INFO: rank = 0
2021-08-26 17:16:10,462 (train:792) INFO: distributed = False
2021-08-26 17:16:10,462 (train:792) INFO: version = 0.5.3
2021-08-26 17:16:14,425 (audio_mel_dataset:78) WARNING: Some files are filtered by mel length threshold (9705 -> 0).
Traceback (most recent call last):
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-train", line 11, in <module>
    load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')()
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 820, in main
    allow_cache=config.get("allow_cache", False),  # keep compatibility
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/datasets/audio_mel_dataset.py", line 85, in __init__
    assert len(audio_files) != 0, f"Not found any audio files in ${root_dir}."
AssertionError: Not found any audio files in $dump/train_nodev/norm.
# Accounting: time=5 threads=1
# Ended (code 1) at 2021年 08月 26日 星期四 17:16:14 CST, elapsed time 5 seconds
anzhi998 commented 3 years ago

sorrry!another quesetion in stage 2:

# parallel-wavegan-train --config conf/parallel_wavegan.v1.yaml --train-dumpdir dump/train_nodev/norm --dev-dumpdir dump/dev/norm --outdir exp/train_nodev_parallel_wavegan.v1 --resume "" --pretrain "" --verbose 1 
# Started at 2021年 08月 26日 星期四 17:16:09 CST
#
2021-08-26 17:16:10,461 (train:792) INFO: sampling_rate = 2000
2021-08-26 17:16:10,461 (train:792) INFO: fft_size = 512
2021-08-26 17:16:10,461 (train:792) INFO: hop_size = 156
2021-08-26 17:16:10,461 (train:792) INFO: win_length = None
2021-08-26 17:16:10,461 (train:792) INFO: window = hann
2021-08-26 17:16:10,461 (train:792) INFO: num_mels = 80
2021-08-26 17:16:10,461 (train:792) INFO: fmin = 0
2021-08-26 17:16:10,461 (train:792) INFO: fmax = 1000
2021-08-26 17:16:10,461 (train:792) INFO: global_gain_scale = 1.0
2021-08-26 17:16:10,461 (train:792) INFO: trim_silence = False
2021-08-26 17:16:10,461 (train:792) INFO: trim_threshold_in_db = 60
2021-08-26 17:16:10,461 (train:792) INFO: trim_frame_size = 2048
2021-08-26 17:16:10,461 (train:792) INFO: trim_hop_size = 512
2021-08-26 17:16:10,461 (train:792) INFO: format = hdf5
2021-08-26 17:16:10,461 (train:792) INFO: generator_params = {'in_channels': 1, 'out_channels': 1, 'kernel_size': 3, 'layers': 30, 'stacks': 3, 'residual_channels': 64, 'gate_channels': 128, 'skip_channels': 64, 'aux_channels': 80, 'aux_context_window': 2, 'dropout': 0.0, 'use_weight_norm': True, 'upsample_net': 'ConvInUpsampleNetwork', 'upsample_params': {'upsample_scales': [4, 4, 4, 4]}}
2021-08-26 17:16:10,461 (train:792) INFO: discriminator_params = {'in_channels': 1, 'out_channels': 1, 'kernel_size': 3, 'layers': 10, 'conv_channels': 64, 'bias': True, 'use_weight_norm': True, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.2}}
2021-08-26 17:16:10,461 (train:792) INFO: stft_loss_params = {'fft_sizes': [1024, 2048, 512], 'hop_sizes': [120, 240, 50], 'win_lengths': [600, 1200, 240], 'window': 'hann_window'}
2021-08-26 17:16:10,461 (train:792) INFO: lambda_adv = 4.0
2021-08-26 17:16:10,461 (train:792) INFO: batch_size = 6
2021-08-26 17:16:10,461 (train:792) INFO: batch_max_steps = 25600
2021-08-26 17:16:10,461 (train:792) INFO: pin_memory = True
2021-08-26 17:16:10,461 (train:792) INFO: num_workers = 2
2021-08-26 17:16:10,462 (train:792) INFO: remove_short_samples = True
2021-08-26 17:16:10,462 (train:792) INFO: allow_cache = True
2021-08-26 17:16:10,462 (train:792) INFO: generator_optimizer_params = {'lr': 0.0001, 'eps': 1e-06, 'weight_decay': 0.0}
2021-08-26 17:16:10,462 (train:792) INFO: generator_scheduler_params = {'step_size': 200000, 'gamma': 0.5}
2021-08-26 17:16:10,462 (train:792) INFO: generator_grad_norm = 10
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_optimizer_params = {'lr': 5e-05, 'eps': 1e-06, 'weight_decay': 0.0}
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_scheduler_params = {'step_size': 200000, 'gamma': 0.5}
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_grad_norm = 1
2021-08-26 17:16:10,462 (train:792) INFO: discriminator_train_start_steps = 100000
2021-08-26 17:16:10,462 (train:792) INFO: train_max_steps = 400000
2021-08-26 17:16:10,462 (train:792) INFO: save_interval_steps = 5000
2021-08-26 17:16:10,462 (train:792) INFO: eval_interval_steps = 1000
2021-08-26 17:16:10,462 (train:792) INFO: log_interval_steps = 100
2021-08-26 17:16:10,462 (train:792) INFO: num_save_intermediate_results = 4
2021-08-26 17:16:10,462 (train:792) INFO: train_wav_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: train_feats_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: train_segments = None
2021-08-26 17:16:10,462 (train:792) INFO: train_dumpdir = dump/train_nodev/norm
2021-08-26 17:16:10,462 (train:792) INFO: dev_wav_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_feats_scp = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_segments = None
2021-08-26 17:16:10,462 (train:792) INFO: dev_dumpdir = dump/dev/norm
2021-08-26 17:16:10,462 (train:792) INFO: outdir = exp/train_nodev_parallel_wavegan.v1
2021-08-26 17:16:10,462 (train:792) INFO: config = conf/parallel_wavegan.v1.yaml
2021-08-26 17:16:10,462 (train:792) INFO: pretrain = 
2021-08-26 17:16:10,462 (train:792) INFO: resume = 
2021-08-26 17:16:10,462 (train:792) INFO: verbose = 1
2021-08-26 17:16:10,462 (train:792) INFO: rank = 0
2021-08-26 17:16:10,462 (train:792) INFO: distributed = False
2021-08-26 17:16:10,462 (train:792) INFO: version = 0.5.3
2021-08-26 17:16:14,425 (audio_mel_dataset:78) WARNING: Some files are filtered by mel length threshold (9705 -> 0).
Traceback (most recent call last):
  File "/ssd/XJK/ParallelWaveGAN/tools/venv/bin/parallel-wavegan-train", line 11, in <module>
    load_entry_point('parallel-wavegan', 'console_scripts', 'parallel-wavegan-train')()
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/bin/train.py", line 820, in main
    allow_cache=config.get("allow_cache", False),  # keep compatibility
  File "/ssd/XJK/ParallelWaveGAN/parallel_wavegan/datasets/audio_mel_dataset.py", line 85, in __init__
    assert len(audio_files) != 0, f"Not found any audio files in ${root_dir}."
AssertionError: Not found any audio files in $dump/train_nodev/norm.
# Accounting: time=5 threads=1
# Ended (code 1) at 2021年 08月 26日 星期四 17:16:14 CST, elapsed time 5 seconds
kan-bayashi commented 3 years ago

2021-08-26 17:16:14,425 (audio_mel_dataset:78) WARNING: Some files are filtered by mel length threshold (9705 -> 0).

Your audio seems too short?

kan-bayashi commented 3 years ago

2021-08-26 17:16:10,461 (train:792) INFO: sampling_rate = 2000

I've never seen sampling rate = 2000 hz.

kan-bayashi commented 3 years ago

If the sampling rate is correct, please reduce batch_max_steps. In training, we do not use the audio less than batch_max_steps points.

anzhi998 commented 3 years ago

I am studying the blood flow sound of patients with vascular stenosis instead of speech. Usually, the highest frequency component of blood flow sound does not exceed 1kHz. Because there were too few patients, I tried to use the generated network to expand the data .And to enhance my dataset,i cuted them into 2.5s pieces,so maybe they are too short. Anyway thank you very much !Your answers are so useful!