nii-yamagishilab / ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
BSD 3-Clause "New" or "Revised" License
110 stars 8 forks source link

Facing issues in converting duration into milliseconds/seconds... #7

Open pavanhitloop opened 1 month ago

pavanhitloop commented 1 month ago

Hi all,

I am trying to get the duration of the given text (text input that we pass to txt2vec) to control the d_control parameter. I'm getting the log_duration_prediction from the ZMM-TTS.txt2vec.model.modules.VarianceAdaptor but I couldn't understand the values type (whether it is in samples or any other format). Ultimately, i'm trying to convert this duration to milliseconds/seconds.

Anyone who played with these parameters, please lead me to convert these values to milliseconds/seconds format. If is it not possible, then how can we get the duration of the given text from this model in the above mentioned format??

Also, is the d_control range is between 0 - 1??

Thanks in Advance.