Closed Yablon closed 4 years ago
@Yablon You can specify encoder=EncoderV1WithAccentType
as a hparam.
https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/hparams.py#L62
You can find EncoderV1WithAccentType
here.
https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/modules/module.py#L230
Note that EncoderV1WithAccentType
receives two inputs, phoneme sequence and accentual type sequence. The accentual type sequence must be aligned with the phoneme sequence. I think you can use the same scheme for Chinese accent.
hparams.py has use_accent_type=True
option to enable accent embedding.
https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/hparams.py#L53
You can find how a model use accentual type here. It obtains accent via features.accent_type
which comes from datasets. It means you have to develop a dataset code for your Chinese corpus.
https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/models/models.py#L309-L310
I think you can develop your dataset based on VCTK example. It has additional attributes like speaker_id. You can add Chinese accent label as well. Accent label is ID sequence, so its implementation will be similar to source
attribute.
https://github.com/nii-yamagishilab/self-attention-tacotron/blob/master/datasets/vctk/dataset.py#L28-L37
I found a problem with ExtendedTacotronV1Model regarding accent. I will fix it in a moment. https://github.com/nii-yamagishilab/self-attention-tacotron/issues/25
@TanUkkii007 Thank you for you reply ! I will read your guide and try your encoder.
Hello, I am studying on how to synthesis Chinese audios, which is similar to Japanese on pitch accents. I am curious what kind of data should I prepare to train the model with pitch accents ? Thank you !