Open sophiedankel opened 6 years ago
Hi @sophiedankel thanks for opening this issue. What kind of Arlo cameras do you have that have this feature?
In order to get this working, we need to figure out what HTTP requests your browser makes when you use this feature from the web UI. To do that, you can open the Chrome devtools (Chrome is best for this task). Click on the Network tab, check "Preserve log", and log into the Arlo website.
Once you've done that, you need to exercise the push-to-talk feature and capture what HTTP requests are being made. Once you do that, you can paste them here and we can go from there. (If you're not sure about how to do this, I have a Slack channel that we can jump on and I can walk you through it via a screen share if you'd like.
FYI - I have Arlo Pro 2 cameras that support this. Need to check if it supports it via the web tho…
I have Arlo Pro 2 cameras and Chrome...
Hit microphone button in Live streaming view: Request URL: https://arlo.netgear.com/hmsweb/users/devices/XXX-XXXXXXX_CCCCCCCCCC/pushtotalk Request Method: GET
XXX-XXXXXXX is user id, CCCCCCCCCC is camera id...
Then a new picture of a microphone pops up just below the live stream and you need to press / hold it down to talk:
I think at this point google analytics GET event is performed (not familiar with this so not sure what you need me to paste in here if anything...)
When you release the button, another google analytics GET is performed...
HTH...
Google Analytics clicks are only for tracking / analysis of what you're doing.
Chrome somehow needs to transfer the audio to arlo.netgear.com or some other netgear server. Can you see anything like that?
@jvigilan what happens when you press the press/hold the talk button? As @shoeper said, you can ignore the google analytics stuff. Do you see any new HTTP requests? What about events in the EventStream?
When you log in, you should see a call to /subscribe, like this. If you click on that request, you will see an "EventStream" tab in the right-hand pane (see the screenshot). If you click on that tab, you will see a all of the events your browser has received from the Arlo servers. When you click the microphone button, what events do you see?
If you're interested, I've got a Slack channel. We can jump on there and reverse engineer it over a screen share real quick (when we both have half an hour or so).
@jeffreydwalter watched the EventStream as suggested and there are 3 'pushToTalk' actions everytime I hit the microphone. No new messages while I hold down the microphone. A new message arrives when I turn off the microphone (Close X). I attached a text file with the messages from the Chrome DevTools. If you need more, I can certainly jump on a Slack channel session. Arlo_Push_to_Talk.txt
@jvigilan I should have some time this week to jump on Slack if you still want to. Let me know.
@jeffreydwalter yes, I can meet this week... Tuesday is not good nor is Thursday afternoon / evening...when is good day / time for you?
How about Wednesday afternoon, say 1 or 2 pm CST?
Hi Jeffrey – 2:00 CST on Wednesday works for me…. Assume you will send the slack channel invite…
Thanks, John Vigilante 219.921.6661 jvigilan@gmail.com
From: jeffreydwalter Sent: Monday, May 7, 2018 11:34 AM To: jeffreydwalter/arlo Cc: jvigilan; Mention Subject: Re: [jeffreydwalter/arlo] Feature Request: Push-to-talk (#52)
How about Wednesday afternoon, say 1 or 2 pm CST? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Awesome, that works fine for me. Here's the link. https://join.slack.com/t/arlo-dev/shared_invite/enQtMzYwNTczMzQ4NTgyLTdlZjgzZjc5NTdhOWZkYzg3MWQ5YzhkNTI4ODgzMmYyMmI3NjBjNjExY2U3MzM4YzljMGMzZDAxZjI0OWQ3Mjg
Hi Guys,
Did u find any work around for this? @jeffreydwalter @jvigilan
@sherifmka2004 we looked into it a little bit, but I haven't had time to follow up.
@jeffreydwalter I noticed these three messages are just when u click the speak button but then there is UDP stream afterwards.
If there’s something I can help with, let me know.
@sherifmka2004 I still haven't had time to dig into this, but it appears that they are using RTP (Real Time Protocol) which uses the SDP (Session Description Protocol), STUN (Session Traversal Utilities for NAT), and ICE (Interactive Connectivity Establishment) protocols for establishing the connection.
Here are some resources:
The conversation goes about like this:
1. POST /users/devices/{unique_id}/pushtotalk
{"data":{"uSessionId":"XXXXXXXXXXXXX!2856F0D8!1525893890884","data":[{"url":"stun:relay01-z2-prod.vz.netgear.com:19302"},{"credential":"XXXXXXXXXXXXXXXXXXXXXXX/XXXXX=","url":"turn:relay01-z2-prod.vz.netgear.com:443?transport=tcp","username":"1525893901:XXX-XXXXXXX"},{"credential":"XXXXXXXXXXXXXXXXXXXXXXX/XXXXX=","url":"turn:relay01-z2-prod.vz.netgear.com:443?transport=udp","username":"1525893901:XXX-XXXXXXX"}],"type":"iceServers"},"success":true}
2. POST /notify
{"action":"pushToTalk","from":"XXX-XXXXXXX","publishResponse":true,"resource":"cameras/XXXXXXXXXXXXX","responseUrl":"","to":"XXXXXXXXXXXXX","transId":"web!98b0c88b!1429756137177","properties":{"uSessionId":"XXXXXXXXXXXXX!2856F0D8!1525893890884","type":"offerSdp","data":"v=0\r\no=- 2808742620419521074 2 IN IP4 127.0.0.1\r\ns=-\r\nt=0 0\r\na=group:BUNDLE audio\r\na=msid-semantic: WMS PFHb9mMwEp0ThE5Ruhsk1rFRtZTyAAGcPJsQ\r\nm=audio 9 UDP/TLS/RTP/SAVPF 111 103 104 9 0 8 106 105 13 110 112 113 126\r\nc=IN IP4 0.0.0.0\r\na=rtcp:9 IN IP4 0.0.0.0\r\na=ice-ufrag:QbPr\r\na=ice-pwd:4GjCKEJNq0N/fruvfRxBZ34V\r\na=ice-options:trickle\r\na=fingerprint:sha-256 EA:F1:38:6C:62:FD:AA:DD:E6:CA:1E:9D:0C:13:2F:5E:9C:3E:F0:2D:C9:93:AE:2E:D5:96:39:D5:93:1D:75:52\r\na=setup:actpass\r\na=mid:audio\r\na=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level\r\na=sendrecv\r\na=rtcp-mux\r\na=rtpmap:111 opus/48000/2\r\na=rtcp-fb:111 transport-cc\r\na=fmtp:111 minptime=10;useinbandfec=1\r\na=rtpmap:103 ISAC/16000\r\na=rtpmap:104 ISAC/32000\r\na=rtpmap:9 G722/8000\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=rtpmap:106 CN/32000\r\na=rtpmap:105 CN/16000\r\na=rtpmap:13 CN/8000\r\na=rtpmap:110 telephone-event/48000\r\na=rtpmap:112 telephone-event/32000\r\na=rtpmap:113 telephone-event/16000\r\na=rtpmap:126 telephone-event/8000\r\na=ssrc:1253177959 cname:o5ar6SxfzgjLyG1v\r\na=ssrc:1253177959 msid:PFHb9mMwEp0ThE5Ruhsk1rFRtZTyAAGcPJsQ b9d7d868-419f-4aef-9eee-dfdafb792147\r\na=ssrc:1253177959 mslabel:PFHb9mMwEp0ThE5Ruhsk1rFRtZTyAAGcPJsQ\r\na=ssrc:1253177959 label:b9d7d868-419f-4aef-9eee-dfdafb792147\r\n"}}
RESPONSE:
{"from":"XXXXXXXXXXXXX","action":"pushToTalk","resource":"cameras/XXXXXXXXXXXXX","properties":{"uSessionId":"XXXXXXXXXXXXX!B0ED31D6!1525894559869","type":"answerSdp","data":"v=0\r\no=NTGRMEDIA 19304 0 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=ice-ufrag:Y22lWpF5OlJfpj74\r\na=ice-pwd:13gZRE0mHTxAZYztTWbzdAI366vDtJpi\r\na=fingerprint:sha-256 C9:4D:E1:45:97:EE:C7:03:43:30:EA:C6:8B:3F:95:E1:FE:72:3B:14:99:60:30:3F:40:D3:04:26:6B:CD:1A:52\r\nm=audio 9 RTP/SAVPF 0 8 97 9 126\r\nc=IN IP4 0.0.0.0\r\na=charset:UTF-8\r\na=rtpmap:9 G722/8000\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=rtpmap:97 opus/48000/2\r\na=rtpmap:126 telephone-event/8000\r\na=recvonly\r\na=setup:active\r\na=rtcp-mux\r\na=candidate:1 1 udp 2130706687 192.168.0.12 37784 typ host\r\na=candidate:2 1 udp 2130706687 172.14.1.1 44437 typ host\r\n"},"transId":"XXXXXXXXXXXXX!a2718d25!1525894560323"}
3. POST /notify
{"action":"pushToTalk","from":"XXX-XXXXXXX","publishResponse":false,"resource":"cameras/XXXXXXXXXXXXX","responseUrl":"","to":"XXXXXXXXXXXXX","transId":"web!98b0c88b!1429756137177","properties":{"uSessionId":"XXXXXXXXXXXXX!2856F0D8!1525893890884","type":"offerCandidate","data":"candidate:4172108666 1 udp 2122255103 2001::9d38:6ab8:18a4:197d:3f57:ffec 56848 typ host generation 0 ufrag QbPr network-id 4 network-cost 50"}}
RESPONSE:
{"from":"XXXXXXXXXXXXX","action":"pushToTalk","resource":"cameras/XXXXXXXXXXXXX","properties":{"uSessionId":"XXXXXXXXXXXXX!B0ED31D6!1525894559869","type":"answerCandidate","data":"a=candidate:4 1 udp 16777471 172.29.5.160 56243 typ relay raddr 98.206.61.240 rport 45318\r\n"},"transId":"XXXXXXXXXXXXX!faf118d6!1525894560659"}
4. POST /notify
{"action":"pushToTalk","from":"XXX-XXXXXXX","publishResponse":false,"resource":"cameras/XXXXXXXXXXXXX","responseUrl":"","to":"XXXXXXXXXXXXX","transId":"web!98b0c88b!1429756137177","properties":{"uSessionId":"XXXXXXXXXXXXX!2856F0D8!1525893890884","type":"offerCandidate","data":"candidate:1645401754 1 udp 2122197247 2601:248:c100:e8d0:94e5:ba0a:548f:4679 56849 typ host generation 0 ufrag QbPr network-id 2"}}
5. POST /notfiy
{"action":"pushToTalk","from":"XXX-XXXXXXX","publishResponse":false,"resource":"cameras/XXXXXXXXXXXXX","responseUrl":"","to":"XXXXXXXXXXXXX","transId":"web!98b0c88b!1429756137177","properties":{"uSessionId":"XXXXXXXXXXXXX!2856F0D8!1525893890884","type":"offerCandidate","data":"candidate:3413779551 1 udp 2122131711 2601:248:c100:e8d0:5cc:7c02:9ed5:dbec 56850 typ host generation 0 ufrag QbPr network-id 3"}}
RESPONSE:
{"from":"XXXXXXXXXXXXX","action":"pushToTalk","resource":"cameras/XXXXXXXXXXXXX","properties":{"uSessionId":"XXXXXXXXXXXXX!B0ED31D6!1525894559869","type":"answerCandidate","data":"a=candidate:3 1 udp 1694499071 98.206.61.240 37784 typ srflx raddr 192.168.0.12 rport 37784\r\n"},"transId":"XXXXXXXXXXXXX!2e80caaa!1525894560352"}
@jeffreydwalter I have interest in this feature too, so I gave it a try last night. Here is my experience based on wireshark :
When you open the push-to-talk icon,
Following your http request in the thread above:
After request 1, the client(laptop) knows where to go for a STUN server. Then STUN protocol is used to find a network path between client(laptop) and server(camera) through NAT in a series of binding request and response in UDP packets.
Then, Request 2, 3, 4 and 5 utilized offer and answer model in SDP protocol to decide media codec for SRTP transmission, and transmit fingerprint.
After all these, DTLS v1.0 protocol through UDP packets is used to exchange the key used for SRTP protocol.
Then the SRTP stream in UDP packets starts transmitting from the client to the server.
When you close the push-to-talk icon, the stream stops.
The hard part for implementation would be the DTLS part, from my viewpoint.
@jeffreydwalter I'm new to your Arlo API and am trying to get this feature to work with my Arlo Pro 2 cameras. I see that this method has been included in the documentation, is it functional? if so, how would I go about implementing this so that I can talk through a camera from my code?
@kt9302 thanks for the info. I'm unfortunately too busy to contribute to the library right now. I'd be happy to advise and more than happy to accept pull requests. It looks like there are several Python DTLS libraries available, so it might be trivial to connect to the DTLS stream.
@gshappell1 push-to-talk is not supported currently.
Several models of Arlo cameras support push-to-talk: https://kb.arlo.com/1004319/What-is-the-push-to-talk-feature-on-my-Arlo-camera-and-how-does-it-work
I would like to extend this python library to be able to send audio to my camera and have it play from the camera's speakers.