Closed Funktionar closed 2 years ago
Outputting to default speaker should be as fast as the demo on the trial page. If you are outputting to an audio file, it's slow.
I'm outputing to speakers and it's slower
should I switch to stream mode?
How slow is it? I didn't experience significantly large delays compared with the demo.
third as slow
I just did a profile:
python -m cProfile -m aspeak -t
2860752 function calls (2854737 primitive calls) in 34.520 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 34.021 34.021 __main__.py:1(<module>)
2 0.000 0.000 0.399 0.199 auth.py:1(<module>)
1 0.000 0.000 0.879 0.879 auth.py:10(_get_auth_token)
1 0.000 0.000 32.147 32.147 functional.py:11(pure_text_to_speech)
1 0.000 0.000 33.983 33.983 main.py:122(main)
1 0.000 0.000 0.948 0.948 main.py:18(read_file)
1 0.000 0.000 0.000 0.000 main.py:25(preprocess_text)
1 0.000 0.000 32.147 32.147 main.py:46(speech_function_selector)
1 0.000 0.000 33.095 33.095 main.py:69(main_text)
1 0.290 0.290 32.146 32.146 provider.py:36(text_to_speech)
2/1 0.000 0.000 0.498 0.498 runpy.py:103(_get_module_details)
1 0.000 0.000 34.520 34.520 runpy.py:199(run_module)
1 0.000 0.000 34.022 34.022 runpy.py:63(_run_code)
1 0.000 0.000 31.820 31.820 speech.py:1565(speak_text)
1 0.000 0.000 29.846 29.846 speech_py_impl.py:6148(speak_text)
So the space for optimization is 33.893 - 31.820 - 0.948 - 0.879 = 0.24600000000000044 Actually there is almost nothing to optimize, except:
We could cache the synthesizer here if you are always using the same parameters for text_to_speech
.
I can provide an API with cached SpeechSynthesizer
in the next version but I'm very busy recently so don't expect that to arrive very soon.
You could do it yourself by building your own version of SpeechServiceProvider
if you are always calling text_to_speech
/pure_text_to_speech
with the same set of parameters
SpeechSynthesizer
and recreate it using the same config in case of token expiration. text_to_speech
and ssml_to_speech
method on your SpeechServiceProvider
to utilize the cached SpeechSynthesizer
and remove the config parmeters from the methods.SpeechServiceProvider.text_to_speech(text)
or SpeechServiceProvider.ssml_to_speech(ssml)
to do speech synthesis (You can create ssml using the create_ssml
function in aspeak.ssml
)However, frankly speaking, I don't know by how mush will the performance improve.
Actually I don't think the 200ms delay is realistic.
I opened https://eastus.tts.speech.microsoft.com in a browser and I got 268ms delay
Thanks
Possible to use this in real-time communications? Compared with just azure it's slower and I have the deepl API to talk with foreigners. I'd like to get the audio within 200 ms and output it to a sound device, if it's feasible.