How to prepare training data with pitch accents ?

Yablon commented 5 years ago

Hello, I am studying on how to synthesis Chinese audios, which is similar to Japanese on pitch accents. I am curious what kind of data should I prepare to train the model with pitch accents ? Thank you !

TanUkkii007 commented 4 years ago

@Yablon You can specify encoder=EncoderV1WithAccentType as a hparam. https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/hparams.py#L62 You can find EncoderV1WithAccentType here. https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/modules/module.py#L230 Note that EncoderV1WithAccentType receives two inputs, phoneme sequence and accentual type sequence. The accentual type sequence must be aligned with the phoneme sequence. I think you can use the same scheme for Chinese accent.

hparams.py has use_accent_type=True option to enable accent embedding. https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/hparams.py#L53

You can find how a model use accentual type here. It obtains accent via features.accent_type which comes from datasets. It means you have to develop a dataset code for your Chinese corpus. https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/models/models.py#L309-L310

I think you can develop your dataset based on VCTK example. It has additional attributes like speaker_id. You can add Chinese accent label as well. Accent label is ID sequence, so its implementation will be similar to source attribute. https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/datasets/vctk/dataset.py#L28-L37

TanUkkii007 commented 4 years ago

I found a problem with ExtendedTacotronV1Model regarding accent. I will fix it in a moment. https://github.com/nii-yamagishilab/self-attention-tacotron/issues/25

Yablon commented 4 years ago

@TanUkkii007 Thank you for you reply ! I will read your guide and try your encoder.

nii-yamagishilab / self-attention-tacotron

How to prepare training data with pitch accents ? #24