jerivl / Deepcut

A robot that raps
Apache License 2.0
1 stars 1 forks source link

Is the forced alligner tight enough to the phonemes? #12

Open jerivl opened 3 years ago

jerivl commented 3 years ago

Potential Issue: The forced alligner timings may not be very accurate for the TTS audio. This would create instances where the syllables are not quantized correctly to the rhythm.

Solution: Montreal forced alligner (MFA) is trainable, so we may be able to improve performance using audio from TTS.

Desired Action: Check to see if MFA timings are tight to the phonemes for syllable isolation. Is there a lot of time before or after you perceive a syllable in each isolated segment? Can we get some estimate of the error?

Notes: This is a very critical piece of the project. This is a high priority item