Closed tongjiyiming closed 2 years ago
To perform teacher forcing decoding with FastSpeech, we use to provide the ground truth durations.
Therefore, you need to add the option --teacher_dumpdir
as the same as the training.
Thank you! after a few tests, I got another error that I can not solve:
FileNotFoundError: [Errno 2] No such file or directory: 'dump/raw/tr_no_dev/durations'
I tried to return my training stage:
!./run.sh --stage 0 --stop-stage 5 \
--teacher_dumpdir "dump/raw" \
--vocoder_file "/ParallelWaveGAN/egs/chineinfocus_single/voc1/exp/train_nodev_parallel_wavegan.v1/checkpoint-400000steps.pkl" \
--download_model "kan-bayashi/ljspeech_conformer_fastspeech2"
I wonder why I got the same error. It looks like that durations
is not created during data preparing stage.
[15c6733b68b7] 2021-12-22 03:59:21,992 (abs_task:1157) INFO: Namespace(accum_grad=1, allow_variable_data_keys=False, batch_bins=5120000, batch_size=20, batch_type='numel', best_model_criterion=[['valid', 'loss', 'min'], ['train', 'loss', 'min']], bpemodel=None, chunk_length=500, chunk_shift_ratio=0.5, cleaner='tacotron', collect_stats=True, config='conf/train.yaml', cudnn_benchmark=False, cudnn_deterministic=True, cudnn_enabled=True, detect_anomaly=False, dist_backend='nccl', dist_init_method='env://', dist_launcher=None, dist_master_addr=None, dist_master_port=None, dist_rank=None, dist_world_size=None, distributed=False, dry_run=False, early_stopping_criterion=('valid', 'loss', 'min'), energy_extract=None, energy_extract_conf={'fs': 22050, 'n_fft': 1024, 'hop_length': 256, 'win_length': None}, energy_normalize=None, energy_normalize_conf={}, feats_extract='fbank', feats_extract_conf={'n_fft': 1024, 'hop_length': 256, 'win_length': None, 'fs': 22050, 'fmin': 80, 'fmax': 7600, 'n_mels': 80}, fold_length=[], freeze_param=[], g2p='g2p_en_no_space', grad_clip=1.0, grad_clip_type=2.0, grad_noise=False, ignore_init_mismatch=False, init_param=[], iterator_type='sequence', keep_nbest_models=5, local_rank=None, log_interval=None, log_level='INFO', max_cache_fd=32, max_cache_size=0.0, max_epoch=200, model_conf={}, multiple_iterator=False, multiprocessing_distributed=False, ngpu=0, no_forward_run=False, non_linguistic_symbols=None, normalize=None, normalize_conf={}, num_att_plot=3, num_cache_chunks=1024, num_iters_per_epoch=500, num_workers=1, odim=None, optim='adam', optim_conf={'lr': 0.001, 'eps': 1e-06, 'weight_decay': 0.0}, output_dir='exp/tts_stats_raw_phn_tacotron_g2p_en_no_space/logdir/stats.1', patience=None, pitch_extract=None, pitch_extract_conf={'fs': 22050, 'n_fft': 1024, 'hop_length': 256, 'f0max': 400, 'f0min': 80}, pitch_normalize=None, pitch_normalize_conf={}, pretrain_path=None, print_config=False, required=['output_dir', 'token_list'], resume=False, scheduler=None, scheduler_conf={}, seed=0, sharded_ddp=False, sort_batch='descending', sort_in_batch='descending', token_list=['<blank>', '<unk>', 'AH0', 'N', 'T', 'D', 'S', 'R', 'L', 'DH', 'K', 'Z', 'IH1', 'IH0', 'M', 'EH1', 'W', 'P', 'AE1', 'AH1', 'V', 'ER0', 'F', ',', 'AA1', 'B', 'HH', 'IY1', 'UW1', 'IY0', 'AO1', 'EY1', 'AY1', '.', 'OW1', 'SH', 'NG', 'G', 'ER1', 'CH', 'JH', 'Y', 'AW1', 'TH', 'UH1', 'EH2', 'OW0', 'EY2', 'AO0', 'IH2', 'AE2', 'AY2', 'AA2', 'UW0', 'EH0', 'OY1', 'EY0', 'AO2', 'ZH', 'OW2', 'AE0', 'UW2', 'AH2', 'AY0', 'IY2', 'AW2', 'AA0', "'", 'ER2', 'UH2', '?', 'OY2', '!', 'AW0', 'UH0', 'OY0', '..', '<sos/eos>'], token_type='phn', train_data_path_and_name_and_type=[('dump/raw/tr_no_dev/text', 'text', 'text'), ('dump/raw/tr_no_dev/wav.scp', 'speech', 'sound'), ('dump/raw/tr_no_dev/durations', 'durations', 'text_int')], train_dtype='float32', train_shape_file=['exp/tts_stats_raw_phn_tacotron_g2p_en_no_space/logdir/train.1.scp'], tts='tacotron2', tts_conf={'embed_dim': 512, 'elayers': 1, 'eunits': 512, 'econv_layers': 3, 'econv_chans': 512, 'econv_filts': 5, 'atype': 'location', 'adim': 512, 'aconv_chans': 32, 'aconv_filts': 15, 'cumulate_att_w': True, 'dlayers': 2, 'dunits': 1024, 'prenet_layers': 2, 'prenet_units': 256, 'postnet_layers': 5, 'postnet_chans': 512, 'postnet_filts': 5, 'output_activation': None, 'use_batch_norm': True, 'use_concate': True, 'use_residual': False, 'dropout_rate': 0.5, 'zoneout_rate': 0.1, 'reduction_factor': 1, 'spk_embed_dim': None, 'use_masking': True, 'bce_pos_weight': 5.0, 'use_guided_attn_loss': True, 'guided_attn_loss_sigma': 0.4, 'guided_attn_loss_lambda': 1.0}, unused_parameters=False, use_amp=False, use_preprocessor=True, use_tensorboard=True, use_wandb=False, val_scheduler_criterion=('valid', 'loss'), valid_batch_bins=None, valid_batch_size=None, valid_batch_type=None, valid_data_path_and_name_and_type=[('dump/raw/dev/text', 'text', 'text'), ('dump/raw/dev/wav.scp', 'speech', 'sound'), ('dump/raw/dev/durations', 'durations', 'text_int')], valid_max_cache_size=None, valid_shape_file=['exp/tts_stats_raw_phn_tacotron_g2p_en_no_space/logdir/valid.1.scp'], version='0.10.4a1', wandb_entity=None, wandb_id=None, wandb_model_log_interval=-1, wandb_name=None, wandb_project=None, write_collected_feats=False)
Traceback (most recent call last):
File "/opt/miniconda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/miniconda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/bin/tts_train.py", line 22, in <module>
main()
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/bin/tts_train.py", line 18, in main
TTSTask.main(cmd=cmd)
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/tasks/abs_task.py", line 994, in main
cls.main_worker(args)
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/tasks/abs_task.py", line 1198, in main_worker
write_collected_feats=args.write_collected_feats,
File "/opt/miniconda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/main_funcs/collect_stats.py", line 55, in collect_stats
for iiter, (keys, batch) in enumerate(itr, 1):
File "/opt/miniconda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/opt/miniconda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/opt/miniconda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/opt/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/miniconda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/opt/miniconda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 28, in fetch
data.append(next(self.dataset_iter))
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/train/iterable_dataset.py", line 155, in __iter__
files = [open(lis[0], encoding="utf-8") for lis in self.path_name_type_list]
File "/opt/miniconda/lib/python3.7/site-packages/espnet2/train/iterable_dataset.py", line 155, in <listcomp>
files = [open(lis[0], encoding="utf-8") for lis in self.path_name_type_list]
FileNotFoundError: [Errno 2] No such file or directory: 'dump/raw/tr_no_dev/durations'
The durations
file is created by the teacher model, e.g., tactoron2 or transformer.
Please check how to train fastspeech / fastspeech 2.
https://github.com/espnet/espnet/tree/master/egs2/TEMPLATE/tts1#fastspeech-training
Thank you! I think I do not need to retrain the teacher model, right? just run the preprocessing stage?
I am trying the finetuning part in this single speaker TTS training: https://github.com/kan-bayashi/ParallelWaveGAN/blob/master/egs/README.md
I am a first-time user. Would you help a little on this issue?
I run into an error when I extract the following:
the log file show: