openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.69k stars 283 forks source link

About Training Epochs error, valid, save model #105

Closed NLPV2011 closed 11 months ago

NLPV2011 commented 1 year ago

how to fix thí error? bruh Siincylit — Hôm nay lúc 05:50 the old dataset i was set the wav file is 1.wav, 2.wav,.v.v. but it doesn save any checkpoints to my computer, so this new dataset, i was set file wav name is media_1.wav, media_2.wav and set testprefix is 'media' and train in colab, but i meet this 'bug' and when i binarize the dataset its only process the phonemes distribution and the train, valid and test its only print "0 it/s [0/0it]" image this is my config `base_config:

task_cls: src.naive_task.NaiveTask datasets: [ 'HaKhanhPhuong', ] num_spk: 1 test_prefixes: [ 'media', '1', '2', '3' ] test_num: 0 valid_num: 0

vocoder: NsfHifiGAN vocoder_ckpt: checkpoints/nsf_hifigan/model use_nsf: true audio_sample_rate: 44100 audio_num_mel_bins: 128 hop_size: 512 # Hop size. fft_size: 2048 # FFT size. win_size: 2048 # FFT size. fmin: 40 fmax: 16000 min_level_db: -120

binarization_args: with_wav: false with_spk_embed: false with_align: true raw_data_dir: 'data/raw/HaKhanhPhuong/segments' processed_data_dir: '' binary_data_dir: '/content/drive/MyDrive/binary/HaKhanhPhuong' binarizer_cls: data_gen.acoustic.AcousticBinarizer g2p_dictionary: dictionaries/vi-strict.txt pitch_extractor: 'parselmouth' pitch_type: frame content_cond_steps: [ ] # [ 0, 10000 ] spk_cond_steps: [ ] # [ 0, 10000 ] spec_min: [-5] spec_max: [0] keep_bins: 128 mel_loss: "ssim:0.5|l1:0.5" mel_vmin: -6. #-6. mel_vmax: 1.5 wav2spec_eps: 1e-6 save_f0: true

pe_enable: true

pe_ckpt: 'checkpoints/0102_xiaoma_pe'

max_frames: 8000 use_uv: false use_midi: false use_spk_embed: false use_spk_id: false use_gt_f0: false # for midi exp use_gt_dur: false # for further midi exp f0_embed_type: discrete lambda_f0: 0.0 lambda_uv: 0.0 lambda_energy: 0.0 lambda_ph_dur: 0.0 lambda_sent_dur: 0.0 lambda_word_dur: 0.0 predictor_grad: 0.0

K_step: 1000 timesteps: 1000 max_beta: 0.02 predictor_layers: 5 rel_pos: true gaussian_start: true pndm_speedup: 10 hidden_size: 256 residual_layers: 20 residual_channels: 384 dilation_cycle_length: 4 # * diff_decoder_type: 'wavenet' diff_loss_type: l2 schedule_type: 'linear'

gen_tgt_spk_id: -1 num_sanity_val_steps: 1 lr: 0.0004 decay_steps: 50000 max_tokens: 80000 max_sentences: 9 val_check_interval: 2000 num_valid_plots: 10 max_updates: 320000 permanent_ckpt_start: 120000 permanent_ckpt_interval: 40000`

yqzhishen commented 1 year ago

How many pieces of data do you have? How many are in training set, and how many in validation set? Have you got a transcription.txt file in your dataset directory?

NLPV2011 commented 1 year ago

How many pieces of data do you have? How many are in training set, and how many in validation set? Have you got a transcription.txt file in your dataset directory?

i have about 35 wav files, i dont know about training set, validation set, i have 1 transcription.txt files

yqzhishen commented 1 year ago

35 wav files are not enough for training. But have you ever looked at the command-line output of binarier? Is your data successfully split into training set and validation set? I doubt that your training set is empty.