r9y9 / nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.
https://r9y9.github.io/nnmnkwii/latest/
Other
393 stars 74 forks source link

Seq2seq #9

Closed r9y9 closed 6 years ago

r9y9 commented 7 years ago

Currently I only tested frame-wise training and sequence-wise training with exactly time-aligned linguistic/acoustic frame features, but we can use seq2seq models for roughly aligned ones. Sequence-wise training with non-aligned dataset, which I haven't tested yet, might reveal library design weakness, so I should try it as soon as possible.

r9y9 commented 7 years ago

As far as I tried Tacotron https://github.com/r9y9/tacotron_pytorch, I don't think there's nothing probably we should add support for it. Dataset utility was enough to me. https://github.com/r9y9/tacotron_pytorch/blob/44922c082e10f0789b81adc33f0fed67bce3f590/train.py#L69-L108

Utility to create padded batch might be useful to have? https://github.com/r9y9/tacotron_pytorch/blob/44922c082e10f0789b81adc33f0fed67bce3f590/train.py#L124-L147

r9y9 commented 6 years ago

I think I can close this