I know the metric real time factor (RTF) from STT (or ASR) systems. A RTF of 0.5 would mean than 1 sec is recognized in 0.5 sec.
I would expect a similar logic for TTS systems. But the numbers reported in larynx' debug output as Real-time factor seem to be 1/RTF. This is confusing, isn't it?
I know the metric real time factor (RTF) from STT (or ASR) systems. A RTF of 0.5 would mean than 1 sec is recognized in 0.5 sec.
I would expect a similar logic for TTS systems. But the numbers reported in larynx' debug output as
Real-time factor
seem to be 1/RTF. This is confusing, isn't it?