goruck / alexa-ip-cam

Use Alexa's Smart Home Skill API with standalone IP cameras without needing cloud service.
MIT License
161 stars 26 forks source link

video plays for about a second or two, then buffers indefinitely #10

Closed justinmiller61 closed 5 years ago

justinmiller61 commented 5 years ago

Sorry to inundate with tickets, but thought maybe you've seen this one. Per my other ticket, I have gotten the full setup working, at least as far as all the functional pieces are concerned. I can get video to display on my Fire, but only for a second or two before it starts buffering indefinitely.

Here are the stunnel logs for what happens when I stop the video.

What I can't quite figure out how to read is the 'Connection closed' line. Is it saying that stunnel received ~21MB from the camera, but only sent less than 1K to the tablet?

2018.12.01 09:52:19 LOG6[2]: TLSv1.2 ciphersuite: ECDHE-RSA-AES256-GCM-SHA384 (256-bit encryption)
2018.12.01 09:52:19 LOG7[2]: Compression: null, expansion: null
2018.12.01 09:52:19 LOG6[2]: s_connect: connecting 127.0.0.1:554
2018.12.01 09:52:19 LOG7[2]: s_connect: s_poll_wait 127.0.0.1:554: waiting 10 seconds
2018.12.01 09:52:19 LOG5[2]: s_connect: connected 127.0.0.1:554
2018.12.01 09:52:19 LOG6[2]: persistence: 127.0.0.1:554 cached
2018.12.01 09:52:19 LOG5[2]: Service [rtsp] connected remote server from 127.0.0.1:54790
2018.12.01 09:52:19 LOG7[2]: Setting remote socket options (FD=11)
2018.12.01 09:52:19 LOG7[2]: Option TCP_NODELAY set on remote socket
2018.12.01 09:52:19 LOG7[2]: Remote descriptor (FD=11) initialized
2018.12.01 09:54:21 LOG6[2]: TLS socket closed (SSL_read)
2018.12.01 09:54:21 LOG7[2]: Sent socket write shutdown
2018.12.01 09:54:21 LOG5[2]: Connection closed: 21099067 byte(s) sent to TLS, 984 byte(s) sent to socket
2018.12.01 09:54:21 LOG7[2]: Remote descriptor (FD=11) closed
2018.12.01 09:54:21 LOG7[2]: Local descriptor (FD=3) closed
2018.12.01 09:54:21 LOG7[2]: Service [rtsp] finished (0 left)
justinmiller61 commented 5 years ago

This is identical to the issue described here: https://forums.developer.amazon.com/questions/103350/video-camera-shows-only-1-second.html

justinmiller61 commented 5 years ago

I’ll also add that sometimes it works. Sometimes it doesn’t. I haven’t yet done any analysis to see with what frequency it succeeds — not that I think it would be informative.

goruck commented 5 years ago

No problem with the tickets - you are helping to make the project better and I appreciate it.

I've run across a problem that looks similar to this. I have not root caused it but I've found that buffering frequency goes down considerably for frame rates < 10 FPS. At 5 FPS I almost never see it. (My cameras are 1080p.)

Can you try dialing back your FPS and see if your problem gets less severe?

justinmiller61 commented 5 years ago

Just set it to 5 and still seeing the issue.

goruck commented 5 years ago

Ok, I'll do some debugging from my side. Like I said I never root caused the buffering issue, looks like I'll need to since FPS reduction didn't help your case (and only mitigates the issue for me). I'm sure others are seeing this as well.

Btw, I see the same statistics on the stunnel Connection closed lines for both good runs and buffered runs. For example, here's a good run:

2018.12.02 19:41:05 LOG5[21]: s_connect: connected 127.0.0.1:554
2018.12.02 19:41:05 LOG5[21]: Service [rtsp] connected remote server from 127.0.0.1:38058
2018.12.02 19:43:02 LOG5[21]: Connection closed: 40402505 byte(s) sent to TLS, 848 byte(s) sent to socket
justinmiller61 commented 5 years ago

Gotcha. Let me know if you need any help (I’m a developer too, though not a lot of video experience) and I’ll do what I can.

goruck commented 5 years ago

Sorry for the late reply.

I've run some tests on my network with wireshark and gathered data to show that when the round trip time of packets between the machine running stunnel and the Alexa device exceeds about 25 ms the video will start to buffer. And for what ever reason the round trip delays using a Fire HD Tablet are far greater than with an Echo Show. This explains why I am seeing decent performance (some buffering) with my Echo Shows but almost complete failure with my Fire HD Tablets.

Here's a wireshark plot that shows the rtt when viewing a camera with an Echo Show. The .4 IP is the machine running stunnel. The Show is at .120. When the rtt exceeded about 24 ms I saw brief buffering on the screen but other than that the video was fine.

screenshot from 2018-12-09 17-07-51

Here's another wireshark plot when I try to view the same camera with a Fire HD 10 tablet (at IP .103). As the title of this issue says, the video plays for about a second but then goes into continuous buffering.

screenshot from 2018-12-09 17-14-36

The Fire tablets have really low end hardware so that may be part of why its so slow to ack packets from the stunnel machine? I can also see from the wireshark traces a lot of retransmissions of the packets going to the tablet. The receive buffer probably isn't large enough to cope with the delay.

I played around a bit with the camera settings that control resolution, quality and frame rate. It does not seem to matter wrt the buffering / network latency.

I'm not quite sure where to go from here yet but here are some things I'm going to try next.

  1. Enable DSCP on my network for video traffic which may help reduce latency.

  2. Contact Amazon to understand the maximum rtt that the Alexa Smart Home Camera API can tolerate. I'll also ask about the issue I'm seeing with their tablets.

  3. Ensure that my server isn't creating a bottleneck somewhere that's contributing to the network latency I'm seeing. There are several TCP tunings that one can do in the kernel but I'm not sure that I could do much better than the defaults.

If you have any other suggestions please let me know!

justinmiller61 commented 5 years ago

Great information! So agreed on the low-end hardware, but oddly I can use my Fire HD to stream from my Ring Doorbell (when it works — but that has its own issues) and from a Wyze camera, via Alexa.

Also, I can stream from my cameras to a native app just fine, though that’s not using RTSP nor with the overhead of stunnel but still...

What’s the data flow between the cameras, the tablet and Amazon? Once the tablet initiates the connection, is the skill involved at all? I.e. do the video packets go back to AWS at all? On Sun, Dec 9, 2018 at 9:25 PM Lindo St. Angel notifications@github.com wrote:

Sorry for the late reply.

I've run some tests on my network with wireshark and gathered data to show that when the round trip time of packets between the machine running stunnel and the Alexa device exceeds about 25 ms the video will start to buffer. And for what ever reason the round trip delays using a Fire HD Tablet are far greater than with an Echo Show. This explains why I am seeing decent performance (some buffering) with my Echo Shows but almost complete failure with my Fire HD Tablets.

Here's a wireshark plot that shows the rtt when viewing a camera with an Echo Show. The .4 IP is the machine running stunnel. The Show is at .120. When the rtt exceeded about 24 ms I saw brief buffering on the screen but other than that the video was fine.

[image: screenshot from 2018-12-09 17-07-51] https://user-images.githubusercontent.com/12125472/49705684-0416a980-fbd5-11e8-9156-a590baf09b22.png

Here's another wireshark plot when I try to view the same camera with a Fire HD 10 tablet (at IP .103). As the title of this issue says, the video plays for about a second but then goes into continuous buffering.

[image: screenshot from 2018-12-09 17-14-36] https://user-images.githubusercontent.com/12125472/49705799-f44b9500-fbd5-11e8-8525-f689da6335e7.png

The Fire tablets have really low end hardware so that may be part of why its so slow to ack packets from the stunnel machine? I can also see from the wireshark traces a lot of retransmissions of the packets going to the tablet. The receive buffer probably isn't large enough to cope with the delay.

I played around a bit with the camera settings that control resolution, quality and frame rate. It does not seem to matter wrt the buffering / network latency.

I'm not quite sure where to go from here yet but here are some things I'm going to try next.

1.

Enable DSCP https://en.wikipedia.org/wiki/Differentiated_services#Commonly_used_DSCP_values on my network for video traffic which may help reduce latency. 2.

Contact Amazon to understand the maximum rtt that the Alexa Smart Home Camera API can tolerate. I'll also ask about the issue I'm seeing with their tablets. 3.

Ensure that my server isn't creating a bottleneck somewhere that's contributing to the network latency I'm seeing. There are several TCP tunings that one can do in the kernel but I'm not sure that I could do much better than the defaults.

If you have any other suggestions please let me know!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/goruck/alexa-ip-cam/issues/10#issuecomment-445632759, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaLzpVJICX8_ALK0UWCTrDgtd8Q9ATaks5u3cYxgaJpZM4Y87hX .

goruck commented 5 years ago

Thanks.

Good to know about the Ring and Wyze cams working with the Fire HD. That seems to indicate problem is due to packet latency elsewhere, perhaps the machine running stunnel. Will focus there for now.

To your question: the lambda function responds back to the Alexa service with the URI of the camera endpoint which is sent to the Alexa device. After that no cloud service is involved. The video player in the Alexa device uses the stream pointed to by the URI which in this case is the local machine running stunnel and the rtsp proxy.

goruck commented 5 years ago

Quick question: what resolution and frame rate do the Ring and Wyze cameras stream to your Fire HD tablet via Alexa?

goruck commented 5 years ago

After sleeping on this issue it occurred to me the problem may be related to streaming video over TCP. This is usually ill advised since it's usually worse to wait for a retransmission than just to drop the packet. Recall RTP uses UDP but since stunnel uses SSL everything is send over TCP.

This may be the root cause.

However I'm not aware of any way to use UDP with stunnel.

I'll think of ways to validate this theory and potential workarounds.

justinmiller61 commented 5 years ago

Ah good point about video over TCP. I can look around as well later on when I get home.

To answer your question about Ring and Wyze resolution and frame rate, I'm not sure. I'll have to look that stuff up later.

On Mon, Dec 10, 2018 at 11:21 AM Lindo St. Angel notifications@github.com wrote:

After sleeping on this issue it occurred to me the problem may be related to streaming video over TCP. This is usually ill advised since it's usually worse to wait for a retransmission than just to drop the packet. Recall RTP uses UDP but since stunnel uses SSL everything is send over TCP.

This may be the root cause.

However I'm not aware of any way to use UDP with stunnel.

I'll think of ways to validate this theory and potential workarounds.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/goruck/alexa-ip-cam/issues/10#issuecomment-445876016, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaLzqThsO1cfmp872J3GBs8zC6VMKwjks5u3ooggaJpZM4Y87hX .

goruck commented 5 years ago

I found a bug in the live555 proxy that causes it to ignore the ONVIF profile which explains why changing the video settings have no effect. Basically the proxy always uses 1080p at 30 FPS at a very high quality setting (which makes for a very high bit rate). I'm going to work around this to force it to use more feasible settings and see if things get better.

goruck commented 5 years ago

Ok, made some progress. I found two issues contributing to this issue. The first is that the proxy was using the same port (554) as the camera streams which resulted in the echo devices connecting to a default camera stream and not the streaming specific streaming configuration I wanted. I moved the proxy to port 8554 which allowed my to control the streaming parameters. Second, I found a set of streaming parameters that seemed to work well for me across a variety of Amazon echo devices. The performance varies because each device has different levels of video decoding performance. Here's a table showing the parameters that worked for me.

Parameter Value Units
Resolution 1280x720 pixels
Encoder Type H.264 NA
Encoder Compression 30 NA
Encoder Max Frame Rate Unlimited NA
Encoder GOP 62 frames
Encoder Profile Baseline NA
Encoder Bit Rate Control Variable NA

And here's a table showing the results across different Amazon devices. Overall it works pretty well.

Device Buffering Frequency Note
Fire TV Cube Never Expected since the device is optimized for video.
Fire TV Stick 4K Never Expected since the device is optimized for video.
Echo Show Gen 1 Rarely
Echo Show Gen 2 Rarely
Fire HD 10 Tablet Occasionally Expected given its hardware capabilities.

Again, I think the source of the buffering is the video decode time in the device which varies across device type due to hardware capabilities. Since the the video is delivered to the device from the camera via TCP (stunnel uses SSL over TCP) the network will not let the device discard packets when gets it gets behind in its decoding. I don't know a way to use stunnel with UDP which would obviate this issue.

I hope this helps in your situation. You may have to further adjust the camera encoder settings to suit your specific situation. Let me know if you get a chance to try it. I'll keep this issue open for now to see how well it works for you.

I've updated the README with this information as well.

justinmiller61 commented 5 years ago

Cool. I’ll try later on. I think for me, it was always using the right parameters because I differentiate by using a different URL to connect to the cameras substream. But I’ll update my code and try your recommended video parameters.

I was thinking this morning about the TCP/UDP problem and in general it seems like TLS over UDP, while technically possible isn’t particularly likely due to the need for reliability in the initial TLS negotiation.

I was reading a little bit though about QUIC. That sounds promising. There is a little reference implementation over at Google’s page but I wonder if it would even be possible to get that working with RTSP. On Sun, Dec 16, 2018 at 9:40 AM Lindo St. Angel notifications@github.com wrote:

Ok, made some progress. I found two issues contributing to this issue. The first is that the proxy was using the same port (554) as the camera streams which resulted in the echo devices connecting to a default camera stream and not the streaming specific streaming configuration I wanted. I moved the proxy to port 8554 which allowed my to control the streaming parameters. Second, I found a set of streaming parameters that seemed to work well for me across a variety of Amazon echo devices. The performance varies because each device has different levels of video decoding performance. Here's a table showing the parameters that worked for me. Parameter Value Units Resolution 1280x720 pixels Encoder Type H.264 NA Encoder Compression 30 NA Encoder Max Frame Rate Unlimited NA Encoder GOP 62 frames Encoder Profile Baseline NA Encoder Bit Rate Control Variable NA

And here's a table showing the results across different Amazon devices. Overall it works pretty well. Device Buffering Frequency Note Fire TV Cube Never Expected since the device is optimized for video. Fire TV Stick 4K Never Expected since the device is optimized for video. Echo Show Gen 1 Rarely Echo Show Gen 2 Rarely Fire HD 10 Tablet Occasionally Expected given its hardware capabilities.

Again, I think the source of the buffering is the video decode time in the device which varies across device type due to hardware capabilities. Since the the video is delivered to the device from the camera via TCP (stunnel uses SSL over TCP) the network will not let the device discard packets when gets it gets behind in its decoding. I don't know a way to use stunnel with UDP which would obviate this issue.

I hope this helps in your situation. You may have to further adjust the camera encoder settings to suit your specific situation. Let me know if you get a chance to try it. I'll keep this issue open for now to see how well it works for you.

I've updated the README with this information as well.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/goruck/alexa-ip-cam/issues/10#issuecomment-447648340, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaLzhFuf0Q98PZnOgprB69e28FwnMS-ks5u5ltmgaJpZM4Y87hX .

justinmiller61 commented 5 years ago

As an update, I just got the 2018 fire 8 and I have yet to see it buffer. And also my echo spot doesn’t buffer as much as the 2017 fire.

So I guess the beefed up hardware of the 2018 models helps performance. On Sun, Dec 16, 2018 at 3:27 PM Justin Miller justin.adam.miller@gmail.com wrote:

Cool. I’ll try later on. I think for me, it was always using the right parameters because I differentiate by using a different URL to connect to the cameras substream. But I’ll update my code and try your recommended video parameters.

I was thinking this morning about the TCP/UDP problem and in general it seems like TLS over UDP, while technically possible isn’t particularly likely due to the need for reliability in the initial TLS negotiation.

I was reading a little bit though about QUIC. That sounds promising. There is a little reference implementation over at Google’s page but I wonder if it would even be possible to get that working with RTSP. On Sun, Dec 16, 2018 at 9:40 AM Lindo St. Angel notifications@github.com wrote:

Ok, made some progress. I found two issues contributing to this issue. The first is that the proxy was using the same port (554) as the camera streams which resulted in the echo devices connecting to a default camera stream and not the streaming specific streaming configuration I wanted. I moved the proxy to port 8554 which allowed my to control the streaming parameters. Second, I found a set of streaming parameters that seemed to work well for me across a variety of Amazon echo devices. The performance varies because each device has different levels of video decoding performance. Here's a table showing the parameters that worked for me. Parameter Value Units Resolution 1280x720 pixels Encoder Type H.264 NA Encoder Compression 30 NA Encoder Max Frame Rate Unlimited NA Encoder GOP 62 frames Encoder Profile Baseline NA Encoder Bit Rate Control Variable NA

And here's a table showing the results across different Amazon devices. Overall it works pretty well. Device Buffering Frequency Note Fire TV Cube Never Expected since the device is optimized for video. Fire TV Stick 4K Never Expected since the device is optimized for video. Echo Show Gen 1 Rarely Echo Show Gen 2 Rarely Fire HD 10 Tablet Occasionally Expected given its hardware capabilities.

Again, I think the source of the buffering is the video decode time in the device which varies across device type due to hardware capabilities. Since the the video is delivered to the device from the camera via TCP (stunnel uses SSL over TCP) the network will not let the device discard packets when gets it gets behind in its decoding. I don't know a way to use stunnel with UDP which would obviate this issue.

I hope this helps in your situation. You may have to further adjust the camera encoder settings to suit your specific situation. Let me know if you get a chance to try it. I'll keep this issue open for now to see how well it works for you.

I've updated the README with this information as well.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/goruck/alexa-ip-cam/issues/10#issuecomment-447648340, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaLzhFuf0Q98PZnOgprB69e28FwnMS-ks5u5ltmgaJpZM4Y87hX .

goruck commented 5 years ago

Thanks for the update.

I agree with with your assessment regarding TCP/UDP. QUIC does look interesting but not sure when or it the Alexa Echo devices will ever support it.

I always found it odd that Amazon would use RTSP over TLS to transport video. Obviously its for security reasons and assuming that the network conditions are good (little packet loss) and the clients have sufficient decode performance then its probably a good solution. But otherwise the client will buffer due to TCP.

I'm still puzzled why your Wyze and Ring devices work on your older Fire HD assuming that they are also using video over TLS. I have a Wyze camera, I'll look at the traffic using wireshark and see what I can find out one of these days.

Amazon recently added WebRTC support to the Alexa Smart Home Camera API and they are now recommending using it over RTSP. See the Alexa.RTCSessionController Interface documentation. I've not used WebRTC before but a brief read indicates UDP can be used to transport video. This is probably the right interface to use going forward but a new proxy will have to be developed for this project. I'll plan on that.

I'm going to close this issue since it seems like at least we understand the problem and have mitigated it to the extent possible.