prophesier / diff-svc

Singing Voice Conversion via diffusion model
GNU Affero General Public License v3.0
2.62k stars 800 forks source link

running python run.py --config training/config_nsf.yaml --exp_name [model name] --reset returns 'list index out of range' error. #76

Open xanderb730 opened 1 year ago

xanderb730 commented 1 year ago

exactly as stated in the title. have I installed something incorrectly?

xanderb730 commented 1 year ago

it does the same when trying to run pre-processing.

xanderb730 commented 1 year ago

`| Hparams chains: ['training/config_nsf.yaml'] | Hparams: K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/engi, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1, config_path: training/config.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, decay_steps: 30000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0, f0_min: 40.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 2048, fmax: 16000, fmin: 40, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 512, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False, keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3, lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.0008, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000, max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1, num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame, pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/engi, ref_norm_layer: bn, rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False, save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear, seed: 1234, sort_by_len: True, speaker_id: engi, spec_max: [0.0], spec_min: [-5.0], spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: , test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train, use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN, vocoder_ckpt: checkpoints/nsf_hifigan/g_00105000, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048, work_dir: checkpoints/engi,

Traceback (most recent call last): File "run.py", line 15, in run_task() File "run.py", line 11, in run_task task_cls.start() File "G:\diff-svc\training\task\base_task.py", line 197, in start task = cls() File "G:\diff-svc\training\task\SVC_task.py", line 34, in init super(SVCTask, self).init() File "G:\diff-svc\training\task\fs2.py", line 31, in init super(FastSpeech2Task, self).init() File "G:\diff-svc\training\task\tts.py", line 31, in init self.phone_encoder = Hubertencoder(hparams['hubert_path']) File "G:\diff-svc\preprocessing\hubertinfer.py", line 22, in init pt_path = list(Path(pt_path).parent.rglob('*.pt'))[0] IndexError: list index out of range ` heres the whole log that i get

xanderb730 commented 1 year ago

okay i figured out my issue there, but now i get the same error as before, the log looks like this: | Hparams chains: ['training/config.yaml'] | Hparams: K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 80, audio_sample_rate: 24000, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/engi, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1, config_path: training/config.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, decay_steps: 40000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0, f0_min: 50.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 512, fmax: 12000, fmin: 30, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 128, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False, keep_bins: 80, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3, lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.0004, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000, max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1, num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame, pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/engi, ref_norm_layer: bn, rel_pos: True, reset_phone_dict: True, residual_channels: 256, residual_layers: 20, save_best: False, save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear, seed: 1234, sort_by_len: True, speaker_id: engi, spec_max: [-1.5451143980026245, -1.5177826881408691, -1.1807013750076294, -0.6732071042060852, -0.47006210684776306, -0.271837055683136, -0.27174147963523865, -0.3395537734031677, -0.2529868483543396, -0.22453370690345764, -0.24767500162124634, -0.22861438989639282, -0.28668588399887085, -0.335957795381546, -0.3118636906147003, -0.34530898928642273, -0.35274678468704224, -0.4730182886123657, -0.45395755767822266, -0.4338522255420685, -0.41395917534828186, -0.29468369483947754, -0.16852207481861115, -0.3900046646595001, -0.6241626739501953, -0.5899035930633545, -0.6534764170646667, -0.6667397022247314, -0.6992383003234863, -0.7867978811264038, -0.8457596302032471, -0.43350857496261597, -0.629216730594635, -0.9135912656784058, -0.9230040311813354, -0.6756577491760254, -0.8399246335029602, -0.8495144248008728, -0.781493067741394, -1.0347247123718262, -1.0051935911178589, -1.1246198415756226, -1.021154522895813, -0.851677417755127, -0.8443652987480164, -0.9016147255897522, -0.7618780732154846, -1.0490750074386597, -1.2046996355056763, -1.2022035121917725, -0.9753153324127197, -1.2503044605255127, -1.0664823055267334, -1.1236635446548462, -1.2223032712936401, -1.0116488933563232, -1.2263423204421997, -1.2552075386047363, -1.3846945762634277, -1.2681812047958374, -1.3416036367416382, -1.264938235282898, -1.2763726711273193, -1.4651004076004028, -1.4880361557006836, -1.5735552310943604, -1.4097294807434082, -1.468385100364685, -1.3768259286880493, -1.3312186002731323, -1.3547866344451904, -1.4387739896774292, -1.1861546039581299, -1.1709729433059692, -1.1812609434127808, -1.1489264965057373, -1.5605546236038208, -2.2702553272247314, -4.064557075500488, -5.809507846832275], spec_min: [-5.740882873535156, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -5.959110260009766, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -5.999546527862549, -6.0, -5.995517730712891, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -5.960205078125, -5.93423318862915, -6.0, -5.933608531951904, -6.0, -6.0, -6.0, -5.953958511352539, -5.908934593200684, -5.911312580108643, -5.882552623748779, -5.932425498962402, -5.91495943069458, -5.826524257659912, -5.777952671051025, -5.775007724761963, -5.849961280822754, -5.7793660163879395, -5.781087875366211, -5.818603992462158, -5.765895366668701, -5.834509372711182, -5.817623615264893, -5.855445384979248, -5.844409465789795, -5.760529518127441, -5.713063716888428, -5.74588680267334, -5.855954647064209, -5.874588489532471, -5.81571626663208, -5.849369049072266, -5.963766574859619, -5.8541646003723145, -5.922942161560059, -6.0, -6.0, -6.0, -6.0], spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: , test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train, use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.hifigan.HifiGAN, vocoder_ckpt: checkpoints/0109_hifigan_bigpopcs_hop128, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 512, work_dir: ,

| Binarizer: <class 'preprocessing.SVCpre.SVCBinarizer'> Traceback (most recent call last): File "preprocessing/binarize.py", line 20, in binarize() File "preprocessing/binarize.py", line 15, in binarize binarizer_cls().process() File "G:\diff-svc\preprocessing\SVCpre.py", line 29, in init super().init(item_attributes) File "G:\diff-svc\preprocessing\base_binarizer.py", line 52, in init assert all([attr in self.item_attributes for attr in list(self.items.values())[0].keys()]) IndexError: list index out of range

chinorasta23 commented 1 year ago

getting same error here image

roberto924 commented 1 year ago

`| Hparams chains: ['training/config_nsf.yaml'] | Hparams: K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/engi, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1, config_path: training/config.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, decay_steps: 30000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0, f0_min: 40.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 2048, fmax: 16000, fmin: 40, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 512, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False, keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3, lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.0008, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000, max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1, num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame, pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/engi, ref_norm_layer: bn, rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False, save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear, seed: 1234, sort_by_len: True, speaker_id: engi, spec_max: [0.0], spec_min: [-5.0], spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: , test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train, use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN, vocoder_ckpt: checkpoints/nsf_hifigan/g_00105000, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048, work_dir: checkpoints/engi,

Traceback (most recent call last): File "run.py", line 15, in run_task() File "run.py", line 11, in run_task task_cls.start() File "G:\diff-svc\training\task\base_task.py", line 197, in start task = cls() File "G:\diff-svc\training\task\SVC_task.py", line 34, in init super(SVCTask, self).init() File "G:\diff-svc\training\task\fs2.py", line 31, in init super(FastSpeech2Task, self).init() File "G:\diff-svc\training\task\tts.py", line 31, in init self.phone_encoder = Hubertencoder(hparams['hubert_path']) File "G:\diff-svc\preprocessing\hubertinfer.py", line 22, in init pt_path = list(Path(pt_path).parent.rglob('*.pt'))[0] IndexError: list index out of range ` heres the whole log that i get

how do you resolve this issue?

kin0303 commented 1 year ago

| Hparams chains: ['training/config_nsf.yaml'] | Hparams: K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, binarization_args: {'shuffle': False, 'with_align': True, 'with_f0': True, 'with_hubert': True, 'with_spk_embed': False, 'with_wav': False}, binarizer_cls: preprocessing.SVCpre.SVCBinarizer, binary_data_dir: data/binary/engi, check_val_every_n_epoch: 10, choose_test_manually: False, clip_grad_norm: 1, config_path: training/config.yaml, content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, decay_steps: 30000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, encoder_type: fft, endless_ds: False, f0_bin: 256, f0_max: 1100.0, f0_min: 40.0, ffn_act: gelu, ffn_padding: SAME, fft_size: 2048, fmax: 16000, fmin: 40, fs2_ckpt: , gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 512, hubert_gpu: True, hubert_path: checkpoints/hubert/hubert_soft.pt, infer: False, keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 0.3, lambda_sent_dur: 1.0, lambda_uv: 1.0, lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.0008, max_beta: 0.02, max_epochs: 3000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 42000, max_input_tokens: 60000, max_sentences: 88, max_tokens: 128000, max_updates: 1000000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 10, num_heads: 2, num_sanity_val_steps: 1, num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/0102_xiaoma_pe/model_ckpt_steps_60000.ckpt, pe_enable: False, perform_enhance: True, pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l2, pitch_norm: log, pitch_type: frame, pndm_speedup: 10, pre_align_args: {'allow_no_txt': False, 'denoise': False, 'forced_align': 'mfa', 'txt_processor': 'zh_g2pM', 'use_sox': True, 'use_tone': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/engi, ref_norm_layer: bn, rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False, save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear, seed: 1234, sort_by_len: True, speaker_id: engi, spec_max: [0.0], spec_min: [-5.0], spk_cond_steps: [], stop_token_weight: 5.0, task_cls: training.task.SVC_task.SVCTask, test_ids: [], test_input_dir: , test_num: 0, test_prefixes: ['test'], test_set_name: test, timesteps: 1000, train_set_name: train, use_crepe: True, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: False, use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, use_vec: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, validate: False, vocoder: network.vocoders.nsf_hifigan.NsfHifiGAN, vocoder_ckpt: checkpoints/nsf_hifigan/g_00105000, warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048, work_dir: checkpoints/engi, Traceback (most recent call last): File "run.py", line 15, in run_task() File "run.py", line 11, in run_task task_cls.start() File "G:\diff-svc\training\task\base_task.py", line 197, in start task = cls() File "G:\diff-svc\training\task\SVC_task.py", line 34, in **init** super(SVCTask, self).**init**() File "G:\diff-svc\training\task\fs2.py", line 31, in **init** super(FastSpeech2Task, self).**init**() File "G:\diff-svc\training\task\tts.py", line 31, in **init** self.phone_encoder = Hubertencoder(hparams['hubert_path']) File "G:\diff-svc\preprocessing\hubertinfer.py", line 22, in **init** pt_path = list(Path(pt_path).parent.rglob('*.pt'))[0] IndexError: list index out of range heres the whole log that i get

how do you resolve this issue?

https://github.com/prophesier/diff-svc/issues/29#issuecomment-1330075184