home-assistant / architecture

Repo to discuss Home Assistant architecture
313 stars 100 forks source link

Define an interface for integrations to configure stream to support expiring URLs #482

Closed allenporter closed 3 years ago

allenporter commented 3 years ago

Context

TL;DR: (1)Define an interface for integrations to configure stream and (2) revisit concatenation across streams.

  1. The nest SDM integration has urls that expire and must be refreshed every 5 minutes as a privacy/security feature. See Device Access: Access a live stream for details on the API and see how Home Assistant schedules an alarm to make this happen.
  2. The current interaction between camera and stream is centered around the assumption that stream_source is a stable URL, which is a very reasonable assumption for most cameras that just expose a single url.
    1. More context: There is also a dict of stream_options, and passes in the keepalive pref managed by the camera itself.
  3. The nest implementation can't confirm to that assumption and the impact is captured in Nest SDM Camera stops streaming, and is unable to provide an excellent user experience. Namely:
    1. When viewing a stream in the UI, it will break when the URL expires. The user must manually refresh.
    2. The Preload Stream option does not work (and may not be recoverable?), so the only option is to load a stream every time which may result in additional latency when joining a stream (You'd think it would help, but thumbnail generated from stream from device triggers are quite old)
    3. Worker threads continue to process streams and add loud error messages to the logs. In the case of Preload Stream that may be one thread every time stream was invoked for a different URL!

Talking with @hunterjm on discord, we thought an architecture issue would be a good place to discuss next steps and some of this proposal came from that talk. (see discussion notes for the raw notes). @uvjustin and @elupus -- would love to hear more thoughts, things to watch out for, or suggestions for PR order. We can also find a time to chat on the Home Assistant discord of that would be better.

There are some opportunities to align the solution with other existing problems, namely:

Proposal

  1. Define an interface for integrations to configure stream. Integrations need a way to update changes to stream configuration that impact the worker -- namely the stream URL.
  2. Revisit how concatenation across individual streams for display is handled.
  3. Revisit testing strategy for the stream component

We'd likely want to make some improvements on #3 while doing any larger changes. #1 would be first to make preload work, then, #2 of handling worker restarts better would be ideal and can come after.

Integration Stream configuration

Introduce the notion of a stream id or cache_key, e.g. associated with a camera, to uniquely identify a combined stream independent of the url. Something like the camera's entity_id or unique_id could be a fit. We'd still want to support lookup by stream url as a performance win so that async_handle_record_service and request_stream can continue to share a single worker. Technically, some of the stream_options could factor in here, but these are unlikely to change in practice. It may also make sense to factor this out into some internal wrapper within stream.

Add new top-level methods in stream to communicate with the stream worker, and push in changes to the stream source by stream id or cache_key. The integration will (through the new top level methods) communicate with Stream. Stream will continue to manage start/stop of the worker when the url/options change. Additionally, we can pull out the keepalive logic from worker so that Stream manages that lifecycle. That is, when a worker exits, Stream will be responsible for restarting it. Currently the worker takes in a reference to a stream, which will be changed instead to take a source and options only.

Concatenation of Streams

The worker is currently responsible for:

  1. invoking pyav on a stream source
  2. demuxing audio/video packets
  3. peeking into the start of the stream to validate , and overall ensuring packets overall are ordered properly
  4. writing packets to output buffers
  5. grouping packets into segments on keyframe and writing then to stream outputs, etc. The worker has 5 internal methods, hinting that we may be outgoing the existing structure.

It may instead be a good time to separate out the work into a pipeline or series of stream processors (e.g. like StreamBuffer or StreamOutput or something else) for ensuring DTS order, vs buffering, vs grouping packets into segments and tracking their duration, since these seem separable operations that don't appear to rely on each other. Failures down stream still result in stopping the worker, and is ok if a bad stream is noticed outside of the main decode loop.

This will also want to incorporate logic for preserving output sequence/pts/dts across streams that either restart from 0 or keep up an existing sequence.

Testing Strategy

  1. An approach in https://github.com/home-assistant/core/pull/44161 mocks out pyav, creates fake containers, and simulate packets, and exercises how they turn into segments.
    1. Significantly increase coverage to >92%, though is a fairly non-trivial fixture
    2. This really is more about exercising the code than emulating real world cameras.
  2. Revisit the existing tests in test_hls and see if they can be re-enabled.
    1. Uses static files checked into the repository and exercises decoding. Very simple since they are black box.
    2. Need to understand why they were flaky. Just cpu intensive presumably?
    3. Likely do not exercise the corner cases of real cameras. Does not really exercise how streams are decoded, just that they don't totally throw exceptions.

I think it may be interesting to work on both approaches in parallel and establish a high baseline of test coverage as refactoring starts. Given the complexity of stream, I think test more test coverage is worth the down sides. Since most of the mocking is at the pyav level, this approach can work even with refactoring proposed above, if we want to change the scope of what is tested with the fixture.

Consequences & Alternatives

balloob commented 3 years ago

I think that moving StreamWorker into a class is a good approach. It will give integrations that use it the necessary power to make changes on the fly, like updating the url.

Since the camera integration already initiates the stream integration, it makes sense that they own the lifecycle of the stream. We could add a new create_stream_worker method to the CameraEntity. Nest could override it and change settings as necessary.

uvjustin commented 3 years ago

@allenporter Very thorough summary. Seems like you and @hunterjm had a good discussion.

Some quick thoughts:

Happy to support with whatever you move ahead with.

allenporter commented 3 years ago

Thanks, all very helpful feedback. I have a couple more questions.

  1. I would like to fix the existing tests that are excluded from testing due to flakiness. When running locally it seems like the main races are that the worker thread finishes and clears the output buffer before the test can consume everything. Does that sound familiar or are there other issues?
  2. Integrations have rules that tests are through the integration APIs and not directly testing the classes. Is that true for internal components also or is it ok to test libraries directly?
MartinHjelmare commented 3 years ago

It's ok to add unit tests for core and helper parts, if they are isolated and used by other parts of the integration or other integrations. We'll make that call during review.

allenporter commented 3 years ago

After mulling this over for the week, I would like to propose the first set of PRs to focus on increasing test coverage, focusing on repairing the existing tests.

allenporter commented 3 years ago

1) Thinking about the lifecycle and object relations, the current state looks something like this:

Home Assistant - Stream Recon (1)

Stream has tight coupling between the worker and the outputs (as in both Stream and worker know about each other and Stream and outputs know about each other). An example is that Stream starts the stream worker, then on stream end the worker invokes put with a None that cause the StreamOutputs to remove themselves from the Stream. All three classes know about keepalive. My hunch is that we'll want to decouple idle timeout checks, keepalive, and stream end a bit, though I am not sure in which order to send PRs to decouple these. An example is that Stream can notice when the worker exits to remove all stream providers (StreamOutput) rather than having the worker tell the providers to remove themselves.

2) When thinking about having camera manage the lifecycle of stream: Stream may be what we want to expose rather than a StreamWorker. We'll likely just want the worker to decode packets, whereas Stream manages starting/stopping the worker and can handle stream source updates from camera.

I think we'll want to slim down the interface of Stream otherwise it may expose too many details. The decoupling mentioned above should help a lot, since a few of the public methods on Stream are called by StreamOutput.

allenporter commented 3 years ago

I believe I may have an approach that can handle updates to URL streams.

The HLS RFC has a tag called EXT-X-DISCONTINUITY. See https://tools.ietf.org/html/rfc8216#section-4.3.2.3 for more details. This tag means that the player can handle any synchronization issues caused by the stream restart. This works in my local testing pretty well!

The approach is something like:

It looks like there needs to be some additional handling of EXT-X-DISCONTINUITY-SEQUENCE when segments are garbage collected as well, if I understand the RFC correctly.

I think this implies the rest of the change is fairly straight forward and we don't need to keep track of pts/dts/etc and perhaps no more large structural changes are needed.

uvjustin commented 3 years ago

I think it makes sense to use that tag. The only concern I have is that this might affect the buffering strategies of the various players - they might want the discontinuity segment earlier than they want it now. The Apple documentation seems to recommend 6 segments of 6 seconds each (see sections 7 and 8 here ), while we are using 3 segments of 1.5 (with 1 keyframe per second this actually rounds up to 2) seconds each. I don't see these values specified in the RFC you linked, although in 6.2.2 the fourth paragraph says we should keep the segment around "for a period of time equal to the duration of the segment plus the duration of the longest Playlist file distributed by the server containing that segment". It's unclear whether that means from the time we first add the segment to the playlist or from when we remove the segment from the playlist, but if it's the latter we should probably double the size of the StreamOutput._segments deque (the only downside is using a little more memory). Of course this change is unrelated to the architecture topic but it's probably worth considering.

hunterjm commented 3 years ago

I think it makes sense to use that tag. The only concern I have is that this might affect the buffering strategies of the various players - they might want the discontinuity segment earlier than they want it now. The Apple documentation seems to recommend 6 segments of 6 seconds each (see sections 7 and 8 here ), while we are using 3 segments of 1.5 (with 1 keyframe per second this actually rounds up to 2) seconds each. I don't see these values specified in the RFC you linked, although in 6.2.2 the fourth paragraph says we should keep the segment around "for a period of time equal to the duration of the segment plus the duration of the longest Playlist file distributed by the server containing that segment". It's unclear whether that means from the time we first add the segment to the playlist or from when we remove the segment from the playlist, but if it's the latter we should probably double the size of the StreamOutput._segments deque (the only downside is using a little more memory). Of course this change is unrelated to the architecture topic but it's probably worth considering.

The apple docs recommendations don't need to be followed 100%. Also, we are technically using whatever the I-Frame interval for the video clip is as our segment length since we are not transcoding video. I know you added the 1.5 logic in there, but some of my feeds are actually 2.5 based on the when the keyframes are. Overall, @allenporter I think that's a good find, and sounds much easier than trying to keep track and sync everything ourselves!

edit: If we use this feature, I'd also be for removing the arbitrary abstraction I added to the implementation at the beginning when I thought we could potentially support other protocols besides HLS to simplify the logic in the stream component further. It made it relatively easy to add record at a later date, but there is tight coupling between the recorder and HLS currently anyway.

uvjustin commented 3 years ago

The apple docs recommendations don't need to be followed 100%. Also, we are technically using whatever the I-Frame interval for the video clip is as our segment length since we are not transcoding video. I know you added the 1.5 logic in there, but some of my feeds are actually 2.5 based on the when the keyframes are. Overall, @allenporter I think that's a good find, and sounds much easier than trying to keep track and sync everything ourselves!

I'm just always worried that some of the random issues we encounter are due to straying from the Apple recommendations. But you're right, looking at the RFC none of that stuff is actually in there - there doesn't seem to be a mention of the segment duration and the example playlists themselves have only 3 segments. There is however that bit about keeping the segments around after removing them from the playlist. I made a quick branch to add that but not sure if it's worth a PR if there are currently no issues. Yes the 1.5 was just a minimum so we wouldn't go much below that...previously I was getting 1 second segments with my 1 keyframe per second keyframe interval (I think that's probably the most common setting for cams). I'm guessing you are getting 2.5 because your keyframes are coming every 1.25 seconds? I don't think many cameras are configured to use keyframe intervals longer than a few seconds unless they are using the non standard H264+ or H265+ codecs.

edit: If we use this feature, I'd also be for removing the arbitrary abstraction I added to the implementation at the beginning when I thought we could potentially support other protocols besides HLS to simplify the logic in the stream component further. It made it relatively easy to add record at a later date, but there is tight coupling between the recorder and HLS currently anyway.

This means removing all the fmts?

allenporter commented 3 years ago

I had a similar thought when looking at the different ways that StreamOutput is used: some methods only used by hls and recorder has custom overrides. The common parts about collecting output segments from the stream worker still seems worth sharing, so I wasn't in a hurry to break it apart. I can see that tracking view specific logic may push it over the line.

This could be a matter of moving some logic up into the HLS output class, then making the view handlers know which output type they are using.

uvjustin commented 3 years ago

I think this implies the rest of the change is fairly straight forward and we don't need to keep track of pts/dts/etc and perhaps no more large structural changes are needed.

@allenporter We might have to figure out a way to deal with the timestamp discontinuities in recorder, but that shouldn't be too hard to do.

allenporter commented 3 years ago

With @uvjustin adding recorder discontinuity support, I think we can all this done! Thanks for everyone for their help in support of nest.

Though there may be some additional cleanup in stream that can happen, maybe more ideas from #46610 that are still useful. If anyone runs across anything else that can be simplified here i'm happy to chase it down.