Closed andweber closed 7 years ago
Hi, At fist Kalliope used PyAudio to handle the sound output. And due to a lot of problems with settings, rates, latency and other stuff we finally decided to use mplayer.
I've tested your branch on my Ubuntu. Not working.. The sound is cut of a each sentence. Like stammering.
It seems to be a problem with pulse. I have the same problem if I play anything with a player. Not everytime but sometimes it hangs and I have to interrupt it and start it again. Tested also on RPi3. But with alsa playing anything works everytime without hanging. So I suppose that using pulse as background is the problem at RPi3.
I have a working script (only tested on Ubuntu 16.04).
The problem now is that pyaudio is only compatible with wav file. And in most cases TTS engines return a mp3 file.. I found some lib to convert but I'm not sure if at the end the process will not be heavier that running mplayer as usual...
I have added a method to convert the file from mp3 to wav. The method need to be used from TTS module.
So, after a couple testing. Pyaudio run faster than mplayer.
# existing file in cache with Pico2wav
# pyaudio: 1.42409491539 seconds
# mplayer: 1.70076394081 seconds
# non existing file in cache with Pico2wav
# pyaudio: 1.37540102005 seconds
# mplayer: 1.59730410576 seconds
The weird thing is that the script run faster with a non generated file. This is pointing a problem with our cache system which it is supposed to make us win some time. Anyway, this is another subject.
The problem with pyaudio is that I don't have the same audio output. I run a couple time the same script with the same wav file and sometime everything work well, and sometime I've only a part of the audio sample played in my speaker. I tried a lot a configuration but no good enough results so far.
I think I'll give a try to this lib
It works with sounddevice lib. But unfortunately it is not perfect at all like with mplayer. Also, the file convertion from mp3 to wav is heavy.... And a last important point, TTS engines return a file which it can be 44000 rate or 48000. It would cost another process time to check what kind of file we have. A workarround for this point would be to hard code the rate in each TTS module.
Some helpful links
Nice to see that you are correctly on this problem. Yesterday I have received my RPi3 and noticed the long response time of Kalliope. First I thought it is because I cloned the last dev branch, so I installed the master but also long response. The next thought was it could be because I used the latest raspbian so I installed raspbian-2017-01-10 lite. Right now is only the last dev branch of Kalliope running on the Pi. I used my starter kid, disabled all synapse expect of 5 and did not installed any other python modules required for non-core neurons. Here I ask for the time, and Kalliope needs about 30 sec. to complete.
2017-05-07 12:34:26,640 :: INFO :: Keyword 1 detected at time: 2017-05-07 12:34:26
2017-05-07 12:34:26,640 :: DEBUG :: Trigger callback called, switching to the next state
2017-05-07 12:34:26,676 :: DEBUG :: Entering state: start_order_listener
2017-05-07 12:34:26,677 :: DEBUG :: Pausing snowboy process
2017-05-07 12:34:26,689 :: DEBUG :: Entering state: playing_wake_up_answer
2017-05-07 12:34:26,690 :: DEBUG :: Selected sound: trigger/dong.wav
2017-05-07 12:34:26,692 :: DEBUG :: Try to load file from 1: /home/pi/jarvis/trigger/dong.wav
2017-05-07 12:34:26,693 :: DEBUG :: File found in /home/pi/jarvis/trigger/dong.wav
2017-05-07 12:34:26,694 :: DEBUG :: Mplayer cmd: ['/usr/bin/mplayer', '-slave', '-quiet', '/home/pi/jarvis/trigger/dong.wav']
2017-05-07 12:34:27,613 :: DEBUG :: Entering state: waiting_for_order_listener_callback
Say something!
2017-05-07 12:34:29,318 :: INFO :: Say something!
Google Speech Recognition thinks you said wie spät
2017-05-07 12:34:42,131 :: INFO :: Google Speech Recognition thinks you said wie spät
2017-05-07 12:34:42,146 :: DEBUG :: order listener callback called. Order to process: wie spät
2017-05-07 12:34:42,247 :: DEBUG :: order in analysing_order_thread wie spät
2017-05-07 12:34:42,247 :: DEBUG :: kill the speech recognition process
2017-05-07 12:34:42,248 :: DEBUG :: [OrderAnalyser] Received order: wie spät
2017-05-07 12:34:42,750 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: default-synapse, user sentence: wie spät
2017-05-07 12:34:42,750 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: hello, user sentence: wie spät
2017-05-07 12:34:42,751 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: was bist du, user sentence: wie spät
2017-05-07 12:34:42,752 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: was kannst du, user sentence: wie spät
2017-05-07 12:34:42,753 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: wie spät, user sentence: wie spät
2017-05-07 12:34:42,754 :: DEBUG :: Order found! Run synapse name: say-local-date
Order matched in the brain. Running synapse "say-local-date"
2017-05-07 12:34:42,754 :: INFO :: Order matched in the brain. Running synapse "say-local-date"
2017-05-07 12:34:42,755 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: uhrzeit, user sentence: wie spät
2017-05-07 12:34:42,755 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: wieviel uhr?, user sentence: wie spät
2017-05-07 12:34:42,756 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: welches datum, user sentence: wie spät
2017-05-07 12:34:42,757 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: was für ein, user sentence: wie spät
2017-05-07 12:34:42,758 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: welcher tag, user sentence: wie spät
2017-05-07 12:34:42,759 :: DEBUG :: [spelt_order_match_brain_order_via_table] order to analyse: ist es zeit für tee, user sentence: wie spät
2017-05-07 12:34:42,760 :: DEBUG :: [LIFOBuffer] Add a new synapse list to process to the LIFO
2017-05-07 12:34:42,760 :: DEBUG :: [LIFOBuffer] number of synapse list to process: 1
2017-05-07 12:34:42,761 :: DEBUG :: [LIFOBuffer] number of neuron to process: 1
2017-05-07 12:34:43,682 :: DEBUG :: [LIFOBuffer] process_neuron_list: is_api_call: False
2017-05-07 12:34:43,973 :: DEBUG :: [NeuronLauncher] replacing brackets from {'is_api_call': False, 'say_template': ['Es ist {{hours}} Uhr {{minute
2017-05-07 12:34:44,290 :: DEBUG :: [NeuronLauncher] replacing brackets from False, using {}
2017-05-07 12:34:44,297 :: DEBUG :: Run neuron: "{'name': 'systemdate', 'parameters': {'is_api_call': False, 'say_template': ['Es ist {{hours}} Uhr
2017-05-07 12:34:45,021 :: DEBUG :: NeuroneModule: TTS args: {'name': 'acapela', 'parameters': {'voice': 'Klaus', 'cache': True, 'language': 'sonid
2017-05-07 12:34:45,023 :: DEBUG :: NeuronModule Say() called with message: {'day_month': '07', 'month': '05', 'hours': '12', 'weekday': '0', 'year
2017-05-07 12:34:46,247 :: DEBUG :: message is dict
2017-05-07 12:34:46,327 :: DEBUG :: tts_message to say: Es ist 12 Uhr 34
Es ist 12 Uhr 34
2017-05-07 12:34:46,622 :: INFO :: Es ist 12 Uhr 34
2017-05-07 12:34:46,667 :: DEBUG :: Class TTSModule called from module Acapela, cache: True, language: sonid16, voice: Klaus
2017-05-07 12:34:46,720 :: DEBUG :: get_path_to_store_audio return: /tmp/kalliope_tts_cache/Acapela/sonid16/Klaus/7184148b7ba1e2dfc6fd60634727f8e5.
2017-05-07 12:34:47,179 :: DEBUG :: TTSModule, File not yet in cache: /tmp/kalliope_tts_cache/Acapela/sonid16/Klaus/7184148b7ba1e2dfc6fd60634727f8e
2017-05-07 12:35:00,582 :: DEBUG :: Acapela : Trying to get url: https://vaasbox.acapela-box.com/MESSAGES/013099097112101108097066111120095086050/A /mpeg
2017-05-07 12:35:00,679 :: DEBUG :: Mplayer cmd: ['/usr/bin/mplayer', '-slave', '-quiet', '/tmp/kalliope_tts_cache/Acapela/sonid16/Klaus/7184148b7b
2017-05-07 12:35:03,669 :: DEBUG :: [LIFOBuffer] complete mode
2017-05-07 12:35:03,672 :: DEBUG :: Entering state: unpausing_trigger
2017-05-07 12:35:03,674 :: DEBUG :: Unpausing snowboy process
Waiting for trigger detection
2017-05-07 12:35:03,850 :: INFO :: Waiting for trigger detection
2017-05-07 12:35:03,851 :: DEBUG :: Entering state: playing_ready_sound
2017-05-07 12:35:03,852 :: DEBUG :: Entering state: waiting_for_trigger_callback
I hope you figure out how Kalliope can run so smoothly on the rpi3 like it did the past weeks on my 10 year old laptop :)
This latency come from the speechrecognition module and not the TTS I think. But yes, it's a problem.
This behaviour happens on my RPi3 too, when I use @bacardi55's repeat neuron and call directly the REST-API. So this happens too if you don't use the speech recognition at all.
Ok. And by the way this neuron is not needed anymore in the last dev branch. See new input value system.
on my RP3 the sounddevice commit is not working well. The sound is crackling and sometimes not played at all. Best results I get when using alsa directly. I will try to implement something based on https://pypi.python.org/pypi/pyalsaaudio/0.8.4 and make a pr to your pyaudio branch.
OK thx. I'll test it when ready then. Keep in mind that the code must work on Ubuntu/debian and Raspbian.
Tested with pyalsaaudio. It's not bad at all! I'll push a version if you want to test guys.
Some benchmarks (on my Ubuntu 16.04 with core i7)
When the file is already in wav format, both player are close.
Pico2wav |Already generated file | time with mplayer: 1.65873289108 seconds
Pico2wav |Already generated file | time with PyAlsaAudioPlayer: 1.5570628643 seconds
Acapela |Already generated file | time with mplayer: 1.54038405418 seconds
Acapela |Already generated file | time with PyAlsaAudioPlayer: 1.55780386925 seconds
When the file need to be generated first, it almost the same with pico2wave, as this TTS generate a file in wave format directly.
Pico2wav |File absent from cache | time with mplayer: 2.21644210815 seconds
Pico2wav |File absent from cache | time with PyAlsaAudioPlayer: 2.3806219101 seconds
The problem is when the file need to be generated and also converted from MP3 to wav. In this case with Acapela but it would be the same for other cloud based TTS.
Acapela |File absent from cache | time with mplayer: 3.88518714905 seconds
Acapela |File absent from cache | time with PyAlsaAudioPlayer: 4.30601286888 seconds
The conclusion I have so far is making a full python player for kalliope will not help us a lot to win some time.
What do you think?
the point for me is not speed, the most important point is stability and sound quality on RPi. So I would say: alsaplayer is not slower but more reliable on RPi - those better choice.
VLC is similar to mplayer for me; playsound - just from github:
On Linux, uses ossaudiodev. I don't have a machine with Linux, so this hasn't been tested at all. Theoretically, it plays WAVE files.
What about an configuration option - e.g. let the user choose the system. mplayer or pyaudio is maybe fine for ubuntu; some alsalib is small and fine for RPi?
I have improved your implementation with pr #271 a little. Basically I configured the hardware device and introduced a fixed junk size. Default (pulse) is not at all working with alsaaudio on my rpi. I set the junk size fixed as first try to debug the problem - not sure whats better.
The chosen "sysdefault:CARD=ALSA" is running perfectly - e.g. clear audio quality; no hangings. This will likely break on Ubuntu; because this card doesn't exist.
Usually device should be "default" for full compatibility to all hardware configurations and maybe configurable in settings.
On RPI the "sysdefault" card choice should be fine for HDMI and audio jack output. USB Audio may not work out of the box. I think it never defaults to card 0.
I'll test on Ubuntu. The final code needs to work on all supported plateform. I need to give a try to pygame too. It's pre installed with Raspbian and can handle all audio file type.
Tested on Ubuntu. I've had to switch the device to default to get it working. The output is the same as branch before your PR.
The problem is still that TTS engines which produce a MP3 that will need to be converted into wav. And the processing is already heavy on my core i7 I cannot imagine on a Rpi with ARM...
yes, the pr was tailored for RPi - it was not intended as a final version. The result may be the same on Ubuntu - on my RPi3 its working only with this configuration/lib; because of the long discussed problem with pulseaudio.
What about making the sound output configurable in settings, e.g. let the user choose which player and which device he wants to use?
We just need to create an audio engine class from which all players inherit similar to the resources like tts or stt. The player can make the support formats available. Depending on the hardware and platform the user can than choose the best player of his choice - in combination with his prefered tts. What do you think? Personally I dont have a problem with using pico2wave on RPi - cloud service based TTS might be better - but its good enough.
I started this issue, because mplayer was not working on my RPi3 - this doesn't mean that mplayer is a bad choice for ubuntu.
It's really weird that mplayer doesn't work on your Rpi. I always use it, even for other projects and it never failed or hang up. Actually I maintain another project which is based on mplayer and I don't use Pulseaudio at all. I'm not sure that we have to keep it. Did you try to remove it?
Anyway, I would like to keep this part simple for users. We are not the first project in the world which wants to play a sound. I need to spend more time searching the right way to do this cleanly. The fact that the convertion cost a lot of resource is really a problem. I don't want to provide a player which would be only capable of playing sounds generated by Pico2wav.
we discussed it already in #231 : Without pulseaudio only one process can take hold of the mic and speaker. Thus snowboy has to be stopped each time. An ALSA loop device is possible but needs manual configuration (not out of the box) and there is no conversion of the rate and there it failed with snowboy. (or at least I failed to configure it??)
Additionally pulseaudio gives more freedom in devices. As far as I understood bluetooth devices are only working with pulseaudio.
Converting mp3 to wave is no solution for an RPi. The clean way I know so far is pyaudio or alsaaudio. If its not sophisticated enough (e.g. mp3 is needed) you end up with mplayer or vlc or something similar - or you use some specific mp3 player like mpg123.
Using mplayer seems to be the simplest way for users. But users should have a config option to choose an other player and other device in case mplayer is not working or to tweak speed if mp3 playing is not required (because pico2wave is used for example).
Kalliope is hanging on my RPi3 during playerback of mplayer. There is no error messages, the mplayer process seems to hang and kalliope is waiting for return of the process.
I can reproduce this with mplayer (using pulse and alsa) and aplay (using pulse) alone during audio playback. It's not every time but I can reproduce it by calling audio playbacks several times. Is there anybody else, having this issue?
I therefore implemented a pyaudio test for kalliope, replacing mplayer, which can be found https://github.com/andweber/kalliope/tree/audio.
This solves the hanging issue for me and by the way I have the feeling (not measured) that kalliope reacts quicker.
If its of interest, I will implement a full solution using pyaudio and maybe even make it configurable.