openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

创建config与训练时不匹配 #102

Closed laiyoi closed 1 year ago

laiyoi commented 1 year ago

refactor-v2分支 我在data文件夹下的数据集文件夹中启用了fixed_pitch_shifting和random_time_stretching,但是python data_gen/binarize.py --config data/expname/config.yaml时控制台显示

| Hparams chains:  ['configs/base.yaml', 'configs/acoustic.yaml', 'data/liuchan_23.06.25/config.yaml']
| Hparams:
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, augmentation_args: {'random_pitch_shifting': {'enabled': False, 'range': [-5.0, 5.0], 'scale': 1.0}, 'fixed_pitch_shifting': {'enabled': False, 'targets': [-5.0, 5.0], 'scale': 0.75}, 'random_time_stretching': {'enabled': False, 'range': [0.65, 2.0], 'domain': 'log', 'scale': 2.0}},
base_config: ['configs/acoustic.yaml'], binarization_args: {'shuffle': True, 'num_workers': 0}, binarizer_cls: preprocessing.acoustic_binarizer.AcousticBinarizer, binary_data_dir: data/liuchan_23.06.25/binary, breathiness_smooth_width: 0.12,
clip_grad_norm: 1, dataloader_prefetch_factor: 2, ddp_backend: nccl, dictionary: dictionaries/opencpop-extension.txt, diff_accelerator: ddim,

其中fixed_pitch_shifting': {'enabled': False还有fixed_pitch_shifting': {'enabled': False,都与我之前在acoustic_preparation.ipynb中选的不符 训练时也是这样,我认为可能与base_config: ['configs/acoustic.yaml'],有关,并且我在acoustic_preparation.ipynb中填写的训练集在tensorboard中只出现了一条

laiyoi commented 1 year ago

翻了好久源码解决了,在preparation/acoustic_preparation.ipynb中4.2部分的代码块中把设置augmentation_args的一段改为

augmentation_args = {}
if random_pitch_shifting['enabled']:
    augmentation_args['random_pitch_shifting'] = {
        'enabled': True,
        'range': random_pitch_shifting['range'],
        'scale': random_pitch_shifting['scale']
    }
    configs['use_key_shift_embed'] = True
if fixed_pitch_shifting['enabled']:
    augmentation_args['fixed_pitch_shifting'] = {
        'enabled': True,
        'targets': fixed_pitch_shifting['targets'],
        'scale': fixed_pitch_shifting['scale']
    }
    configs['use_spk_id'] = True
    configs['num_spk'] = 1 + len(fixed_pitch_shifting['targets'])
if random_time_stretching['enabled']:
    augmentation_args['random_time_stretching'] = {
        'enabled': True,
        'range': random_time_stretching['range'],
        'domain': random_time_stretching['domain'],
        'scale': random_time_stretching['scale']
    }
    configs['use_speed_embed'] = True
configs['augmentation_args'] = augmentation_args

即给每项加上 'enabled': True, 或者可以在config.yaml中给每项加上enabled: true

yqzhishen commented 1 year ago

v2分支暂时没有正式发布,想使用的话,最好仔细阅读并subscribe这个issue:https://github.com/openvpi/DiffSinger/issues/74