r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.31k stars 499 forks source link

When using MOL recipe, run.sh throws ValueError: min() arg is an empty sequence #187

Closed m-k-S closed 4 years ago

m-k-S commented 4 years ago

Hi, when I try to train a new model using CUDA_VISIBLE_DEVICES="0,1" ./run.sh --stage 1 --stop-stage 2, I get the error:

Sampling frequency: 22050
/home/verma/.local/lib/python3.6/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.preprocessing.data module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.preprocessing. Anything that cannot be imported from sklearn.preprocessing is now part of the private API.
  warnings.warn(message, FutureWarning)
0it [00:00, ?it/s]
Wrote 0 utterances, 0 time steps (0.00 hours)
Traceback (most recent call last):
  File "/home/verma/work/max/wavenet_vocoder/egs/mol/../..//preprocess.py", line 71, in <module>
    preprocess(mod, in_dir, out_dir, num_workers)
  File "/home/verma/work/max/wavenet_vocoder/egs/mol/../..//preprocess.py", line 25, in preprocess
    write_metadata(metadata, out_dir)
  File "/home/verma/work/max/wavenet_vocoder/egs/mol/../..//preprocess.py", line 36, in write_metadata
    print('Min frame length: %d' % min(m[2] for m in metadata))
ValueError: min() arg is an empty sequence

Does anyone know how I might fix this?

My preset json file is:

{
"name": "wavenet_vocoder",
"input_type": "raw",
"quantize_channels": 65536,
"preprocess": "preemphasis",
"postprocess": "inv_preemphasis",
"global_gain_scale": 0.55,
"sample_rate": 22050,
"silence_threshold": 2,
"num_mels": 80,
"fmin": 80,
"fmax": 7600,
"fft_size": 1024,
"hop_size": 256,
"frame_shift_ms": null,
"win_length": 1024,
"win_length_ms": -1.0,
"window": "hann",
"highpass_cutoff": 70.0,
"output_distribution": "Logistic",
"log_scale_min": -32.23619130191664,
"out_channels": 30,
"layers": 24,
"stacks": 4,
"residual_channels": 128,
"gate_channels": 256,
"skip_out_channels": 128,
"dropout": 0.0,
"kernel_size": 3,
"cin_channels": 80,
"cin_pad": 2,
"upsample_conditional_features": true,
"upsample_net": "ConvInUpsampleNetwork",
"upsample_params": {
"upsample_scales": [
4,
4,
4,
4
]
},
"gin_channels": -1,
"n_speakers": 7,
"pin_memory": true,
"num_workers": 2,
"batch_size": 8,
"optimizer": "Adam",
"optimizer_params": {
"lr": 0.001,
"eps": 1e-08,
"weight_decay": 0.0
},
"lr_schedule": "step_learning_rate_decay",
"lr_schedule_kwargs": {
"anneal_rate": 0.5,
"anneal_interval": 200000
},
"max_train_steps": 1500000,
"nepochs": 2000,
"clip_thresh": -1,
"max_time_sec": null,
"max_time_steps": 10240,
"exponential_moving_average": true,
"ema_decay": 0.9999,
"checkpoint_interval": 100000,
"train_eval_interval": 100000,
"test_eval_epoch_interval": 50,
"save_optimizer_state": true
}

For reference, I am trying to train on the FMA dataset: https://github.com/mdeff/fma

r9y9 commented 4 years ago

Wrote 0 utterances, 0 time steps (0.00 hours)

The log clearly says what was wrong there.