Potential Issue: The forced alligner timings may not be very accurate for the TTS audio. This would create instances where the syllables are not quantized correctly to the rhythm.
Solution: Montreal forced alligner (MFA) is trainable, so we may be able to improve performance using audio from TTS.
Desired Action: Check to see if MFA timings are tight to the phonemes for syllable isolation. Is there a lot of time before or after you perceive a syllable in each isolated segment? Can we get some estimate of the error?
Notes: This is a very critical piece of the project. This is a high priority item
Potential Issue: The forced alligner timings may not be very accurate for the TTS audio. This would create instances where the syllables are not quantized correctly to the rhythm.
Solution: Montreal forced alligner (MFA) is trainable, so we may be able to improve performance using audio from TTS.
Desired Action: Check to see if MFA timings are tight to the phonemes for syllable isolation. Is there a lot of time before or after you perceive a syllable in each isolated segment? Can we get some estimate of the error?
Notes: This is a very critical piece of the project. This is a high priority item