Linguistic features for p280 speaker

ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper

MIT License

5.41k stars 1.29k forks source link

Linguistic features for p280 speaker #92

Open bajibabu opened 7 years ago

bajibabu commented 7 years ago

Hi,

I generated the linguistic features as mentioned in the WaveNet paper for p280 speaker. If anyone is interested to use them for conditioning in WaveNet, please download via https://users.aalto.fi/~bollepb1/binary_labels_p280.zip. Each frame or row corresponds to 5ms of speech.

ibab commented 7 years ago

That's very cool, thanks! I haven't looked into generating linguistic features yet. Can you explain what you've done to generate these?

bajibabu commented 7 years ago

1) I used the HTS http://hts.sp.nitech.ac.jp/ toolkit to get the full-context labels from text files. 2) State-level durations are obtained by HMM-based force-alignment steps using the same HTS toolkit 3) The full-context label features are transformed into binary and numerical features using Merlin toolkit https://github.com/CSTR-Edinburgh/merlin

mortont commented 7 years ago

Very cool! To my understanding, we should be able to feed these vectors directly into the training and generation, after a sort of preprocessing step where they are generated based on an input string, correct? It might be worth wrapping this up into a function to make that process easier.

@bajibabu is one of these features F0 or will that need to be generated separately?

bajibabu commented 7 years ago

@mortont Oops! I forget to put the F0 values.. I will append them on tomorrow morning.

bajibabu commented 7 years ago

I updated the label files with F0 values.

mortont commented 7 years ago

Thanks @bajibabu! I've never used HTS or merlin, could you walk through the steps you used to create these in more detail?

bajibabu commented 7 years ago

You can find the more details in this post http://www.speech.zone/exercises/build-your-own-dnn-voice/prepare-the-input-labels/

rockyrmit commented 7 years ago

@bajibabu im trying to use the linguistic features that you help generated for speaker p208 to feed into the WaveNet model to generate a meaningful voice like "Hello, WaveNet!" - have you done that, and if so, can you help share the detailed steps to recreate that? thanks!

bajibabu commented 7 years ago

I didn't do that.

liangmin0020 commented 7 years ago

my subject is also TTS. And the features of p280, including the full-lab part and the f0 value? have all these value changed to binary, could you please give a detailed description of the features

DabiaoMa commented 7 years ago

@bajibabu Hi, bajibabu. I am a newer to this field and very interested to the local conditioning. I tried the link you provided to download the linguistic features, while it turns out it is not available. Would you please send me a copy of that ? Thank you .

rafaelvalle commented 6 years ago

@bajibabu Can you update the link to the linguistic features you computed?

toannhu commented 6 years ago

I couldn't contact to @bajibabu. Hope someone still has the link in the computer will share it to everyone.

rafaelvalle commented 6 years ago

@rockyrmit do you have the zip file with linguistic features?

AzamRabiee commented 6 years ago

I need the zip file, as well. can any one share it?