playing tts/audio on VTO

luzik commented 2 years ago

It would be awesome, to be able to send tts or audio via VTO speaker.

My personal use case is to connect face recognition with voice messages. Something like "Hello MyName"

If there is no direct command for that, my VTO have a place where I can store mp3 audio for various events. Maybe rroller/dahua could generate mp3, upload it to VTO, and trigger an action for that ?

Saiyajin53 commented 2 years ago

you can change the orginal voice with your own mp3 but there is a limit with 20kb only :/

itkfilelor commented 2 years ago

I may have found the api command for the Amcrest AD110 doorbell, in theory it would be the same for the Dahua ones. Doing some tests and will report back.

UPDATE: Ok, so apparently "we" have already known about the endpoint for sometime. From what I have found is it is really sketch for files, it needs to be rather short and lower quality, else the device gets overwhelmed. I plan to work on some premade tts recordings and see where it leads. MORE: I found this So i took a google tts file I made in HA and converted it like they showed in the thread: sox -v 0.8 audio_test.mp3 -r 8k -c 1 audio_test.al Then I sent: sleep 45 && curl -vvv \ --limit-rate 8K \ -F "file=@audio_test.al;type=Audio/G.711A" \ -H "Content-Type: Audio/G.711A" \ http://admin:password@<ip>/cgi-bin/audio.cgi\?action\=postAudio\&httptype\=singlepart\&channel\=1 set a timer on my phone and ran my fat arse upstairs and waited. I heard the TTS on my doorbell within 1.5s of the timer expiring. There was a little garbage at the beginning and end but the voice came over clear. When I have a chance I will see about making it a media_player entity.

luzik commented 2 years ago

My VTO

curl -vvv --user "admin:pass" --limit-rate 8K -F "file=@audio_test.al;type=Audio/G.711A" -H "Content-Type: Audio/G.711A" "http://192.168.1.30/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1"
*   Trying 192.168.1.30:80...
* Connected to 192.168.124.30 (192.168.124.30) port 80 (#0)
* Server auth using Basic with user 'admin'
> POST /cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1 HTTP/1.1
> Host: 192.168.124.30
> Authorization: Basic XXXXX
> User-Agent: curl/7.74.0
> Accept: */*
> Content-Length: 11138
> Content-Type: Audio/G.711A; boundary=------------------------a90a8721f68274a4
>
* We are completely uploaded and fine

....and hang

luzik commented 2 years ago

But it actually plays nicely on my VTO!!

Just not response ending session

luzik commented 2 years ago

With VTO2211G I do not need --limit-rate nor auth ?!? To get connection close I added --speed-limit 1 --speed-time 1 that close connection where transfer drops below 1byte/sec in 1 sec window.

Can dahua be visible as HA MediaPlayer class device? or maybe it is wrong idea ? It would be awesome to include automatic audio convertion and play function in https://github.com/rroller/dahua

itkfilelor commented 2 years ago

Yeah I had the hang as well. I've never messed with any form of media streaming in python so I don't know how to handle that with the requests module that we are using here. In fact most of my http get/post experiencesin python were simple endpoints that auto closed. This endpoint appears to be the one the app uses to open the stream, but the docs don't show how it ends. I'll have to dive into the requests module and see how it closes persistent connections.

luzik commented 2 years ago

Maybe this ?

r = requests.get('https://github.com', timeout=(3.05, 5))

https://docs.python-requests.org/en/latest/user/advanced/#timeouts

3.05 - connection timeout 5 - read timeout

itkfilelor commented 2 years ago

😂 😅 Never looked into it before, this is likely the way. When I have a mo to work on I'll submit a new PR. Can you confirm with the dahua device the endpoint is the same as my amcrest bell?

luzik commented 2 years ago

Yes it is Please also consider using FFmpeg instead of sox. Default home-assistant docker image, contains only ffmpeg.

ffmpeg -i audio_test.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 audio_test.al is working for me. Later on I will test it with acc (should be supported with hardware, and using less space/ be faster)

itkfilelor commented 2 years ago

Got it, have some free time coming up, I'll look into it.

luzik commented 2 years ago

I failed trying to play an ACC format on my VTO. pcm_alaw is a way to go.

calisro commented 2 years ago

I've been playing around with this. The issue I am having, though, is after sending a few streams of audio (which work very well btw with pcm_alaw) it then refuses any more. Its almost like it needs a 'end conversation' to be sent to close the existing connections. I am at a loss tbh.

What I have noticed though. It sends perfectly the first time and then fails the second. I believe the 'mic' needs to be turned off somehow. In the amcrest app, you turn the mic on, speak, then turn it off.

IF I test the first time, then go into the app and toggle the mic it works again. I need to figure out how to 'turn off the mic' after sending. Any ideas?

EDIT: Timeouts/keepalive fixed it. https://github.com/rroller/dahua/issues/181#issuecomment-1148628524

NickM-27 commented 2 years ago

Would love to see this as a media player!

luzik commented 1 year ago

As media player would be a grate feature it probably take some time to implement.. in the meantime did someone figure out how to automate/script this in HA ?

calisro commented 1 year ago

Well. For any camera that supports onvif profile T, you can now 2-way with the cameras with go2rtc. I'm using it with a ad410 perfectly.

luzik commented 1 year ago

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

GaryOkie commented 1 year ago

For what it's worth - the techniques described here also work on the Amcrest AD110/AD410 doorbells to send custom sounds, including sirens.

morpheus8888 commented 1 year ago

Would love to see this as a media player!

any news? i'm very interested in this

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

can you explain the procedure better for a newbie like me? thank you @luzik

Pveska commented 1 year ago

So, any progress with that issue?

Pveska commented 1 year ago

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode.

Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

baudneo commented 6 months ago

Yeah, I am very trilled running go2rtc in 2-way mode ..just struggling with ssl via traefik under "network_mode: host" mode. Meanwhile I wrote automation for playing TTS over VTO ..this is a main part

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"messa  ge\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url |jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i   /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Aud  io/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

Would be nice if you explain that code for us

Launches bash and sets 2 local variables

VAR 1: 'name' = {{states('input_text.person_at_door')}} (Jinja template for HASS to process)
- input_text.person_at_door - is out of scope here, but I am assuming that there is an external automation that runs face detection and recognition that sets input_text.person_at_door to a name like "George" or possibly "Unknown" for faces that aren't recognized.
VAR 2: 'x' = /usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url
- VAR 2 x is a curl command that creates a TTS audio file using text: Hi, $name. It queries HASS TTS endpoint to create a sound file, the variable 'x' is then set to the URL output that is parsed by jq binary. This gives a URL that you can HTTP GET to obtain the TTS audio file (in .mp3 format, I assume).
&& /usr/bin/curl $x -o /tmp/audio_vto.mp3 - if the 'name' and 'x' vars are set correctly (&& will not execute if the previous command fails) it then runs curl and saves the HASS generated TTS file to a temporary .mp3 file at /tmp/audio_vto.mp3
&& /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16 /tmp/audio_vto.al - converts the .mp3 to pcm_alaw with proper flags and saves it to /tmp/audio_vto.al
&& /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3 - Issues the final command to send the pcm_alaw file to the VTO device for playback, and then deletes the 2 temp audio files (mp3 and alaw).
- Change VTO_IP to the actual IP of your VTO device.

The original command has 2 spaces in the last commands -H \"Content-Type: Au dio/G.711A\"

Here is a reformatted command with the whitespace removed:

shell_command:
   play_tts_on_vto: >-
     /bin/bash -c "name={{states('input_text.person_at_door')}} ; x=`/usr/bin/curl -X POST -H \"Authorization: Bearer TOKEN\" -H \"Content-Type: application/json\" -d '{\"message\": \"Hi '$name'\", \"platform\": \"google_translate\"}' http://localhost:8123/api/tts_get_url | jq -r .url` && /usr/bin/curl $x -o /tmp/audio_vto.mp3 && /usr/bin/ffmpeg -i /tmp/audio_vto.mp3 -c:a pcm_alaw -ac 1 -ar 8000 -sample_fmt s16  /tmp/audio_vto.al && /usr/bin/curl -vvv -F \"file=@/tmp/audio_vto.al;type=Audio/G.711A\" -H \"Content-Type: Audio/G.711A\" \"http://VTO_IP/cgi-bin/audio.cgi?action=postAudio&httptype=singlepart&channel=1\" --speed-limit 1 --speed-time 1; rm /tmp/audio_vto.al /tmp/audio_vto.mp3"

rroller / dahua

playing tts/audio on VTO #177