moq-wg / moq-transport

draft-ietf-moq-transport
Other
70 stars 16 forks source link

Need expiry time / time to live #440

Open fluffy opened 2 months ago

fluffy commented 2 months ago

At some point we removed the time to live of how long obhects exist in the cache before they are removed.

It is becoming an significant implementation blocker to not have some version of this. I would like to add some version of this back soon even if we later change the exact details of how times are calculated and represented.

afrind commented 2 months ago

+1 I think this is something we should address before the interim.

At some point we removed the time to live of how long objects exist in the cache before they are removed

Looking back to -00, we've never had TTLs in the draft, perhaps there were pre-adoption versions that had it?

We do have issue #249, perhaps this can be closed as a duplicate?

ianswett commented 2 months ago

I think this is a dupe of #249, but most comments in #249 are before the current forwarding preference Object model, so we should close one of the two issues, but I don't have a strong opinion on which.

@fluffy Can you clarify why this is a significant implementation blocker? That might motivate us to ensure we create something that's at least good enough for now.

Some questions

Possible Use cases:

fluffy commented 2 months ago

Great questions. Agree this is a dup of #249 but so much stuff there I think it might be easier to collect up requirements on a clean issue.

I'll vaguely use TTL but read that as just some sort of time, could be absolute or delta. More on that later.

For interactive like webex, the TTL tends to be very low - think numbers in the range of a few hundreds to a few thousand milliseconds. We are sending the P frames with a lower TTL than the I frames as that can significantly reduce cache sizes without much impact on user experience. We are sending video with group per track and audio with datagrams. Clearly many other case would have much higher TTL but just talking about this case.

Requirements: Can set per object. Perhaps nice to have some way to optimize when same for all objects in the stream.

Can represent TTL at about at least 10 ms resolution.

An immutable property of the object set by the end publisher / origin.

The relays SHOULD cache it for at least the time in the TTL. We would expect to pay a CDN for storage based on this time as well as bandwidth. Clearly there are failure cases and other operational cases where it would be dropped but we would expect the CDN to provide some sort of reasonable SLA on not dropping before that time. As far as the speck goes, I think it should say that "for relays that cache" SHOUL keep for at least the time in TTL.

The relay SHOULD NOT forward it after the TTL has expired. It is not longer useful at this point and we don't want the bandwidth used by data that will be thrown out by the client because it is too old.

On the topic of this is a TTL style delta or an absolute expiry time. It would be nice to have a solution where the client did not need NTP synchronized time but we could easily live with a requiring synchronized time if that was the best direction. The end publisher would put in a time when it created the object. If we go with the delta style solution, the publisher might put in 500 ms. Each relay that received this object would look at local time it was received and expire it 500 ms after that. People talk about decrementing the delta when it gets sent to other relays but that gets complicated and does not add much value. If we do deltas, I think the delta should just be relative to the time the relay received the object and delta is not changed when the object is sent downstream. Note if we go with absolute time, I imagine clients would get a the current time not from the operating system but from the relay. If we go with absolute time, I think it would be client adds an absolute time stamp of when to expire and we assume the clients and all the relays are NPT synced. I prefer the time delta approach because it will be smaller on the wire and does not require synchronized time.

This is starting to end up an implementation blocker because we want to move our existing pre moq stuff we did with quicr over to match the moq spec. But there is no way we can transition without some sort of ways to send a TTL to the relays.

wilaw commented 2 months ago

Live video streaming, where I'd expect it to be similar to the client buffer depth or more.

Actually with live video streaming (such as sports), the TTL would need to equal the DVR Window duration, i.e the time window over which users are able to access any part of the stream. For a 4hr American football match for example, you may want a 5hr TTL window to include the pre-game show and the whole game, all while the player has only a 12s forward buffer.

As far as the speck goes, I think it should say that "for relays that cache" SHOUL keep for at least the time in TTL.

I agree with this interpretation. Relay caching is a performance optimization enacted by a relay operator. It should be decoupled from the core pub/sub behaviors . A relay is still a valid relay if it never caches and instead retrieves everything from upstream.

The relay SHOULD NOT forward it after the TTL has expired.

This is a departure from HTTP semantics and while I'm not against it, we need to be careful with inferred behavior. In HTTP land, an object is always available until the origin returns a 404, irrespective of the TTL signaled in any cache header. For moqt, we would be overloading TTL to convey both desired cache duration and object availability. Is there ever a use-case where we would want to differentiate the two? I can think of one. A sports provider has a 4-hour sports game. Most users will be at the live edge, but during the live broadcast users can skip back and watch prior highlights. The distributor has to pay the relay CDN for cache space. They want to cache the live edge and the highlights for performance reasons, but not the entirety of the 4-hr event. So they may set a 5min cache TTL (figuring that they will receive repeated requests for live edge and highlight content within a 5 minute window) but a 4 hour availability window. If we use TTL to signal both, we can't do that.

ianswett commented 2 months ago

Thanks for the comments.

I believe the reason to require a cache no longer serve an Object is policy/legal/etc oriented, since the content of Objects doesn't change? It feels like there could be a better mechanism to remove content than a TTL, so maybe TTL is better if it's restricted to being a suggested amount of time to cache the content?

ianswett commented 2 months ago

This doesn't feel like an Object property, because if you played a live stream live you might get one TTL, and if you played it the next day, I would expect the TTL to be different or not even present?

As such, would it be OK to add a TTL field to SUBSCRIBE_OK?

afrind commented 2 months ago

Individual Comment:

If the TTL is relative to when the relay receives it, then in Will's example, the cache may expire some content, but on a later request, it can go upstream again and it will either exist or not at that time. If it exists, then it gets refreshed, if not, it's non-existence is cacheable.

Do we want/need an optimization that will allow the relay to transition object status from "normal" to "expired/permanently gone" without another trip upstream?

Just typing this makes me think we should take a long look at the HTTP caching functions (eg: revalidation) and pick the ones we think make sense for moq. If moq is successful, CDNs will want to adapt HTTP caches to serve these objects, or possibly serve the same objects over moq and HTTP, so we should align where it makes sense.

fluffy commented 1 month ago

Updating from conversation from Will today.

We have the use case of a where on a 2 hour sports event, the DVR window is 30 minutes and can not scrub back more than 30 minutes from live edge.

This leads to wanting to be able to say "relays don't send this object more than 30 minutes after it was produced"

We also have for real time cases "don't send this object more than 500 ms after it was received by relay"

vasilvv commented 1 month ago

I wonder if it would make sense to make TTL expressed in groups, instead of wall time?

E.g. TTL=1 means "stop sending the previous group as soon as the beginning of the new group arrives", TTL=2 means "keep the current group and the right before it", and vice versa. This has an advantage of working with variable-size groups, and also compresses much better.

ianswett commented 1 month ago

That idea makes sense with Stream per Group, but I'm not sure how it applies to the other two Object encodings.

vasilvv commented 1 month ago

I guess the general observation is that the delivery deadline mechanism should be probably mapping-specific? If you're doing stream-per-group/object, you'd want to be resetting streams, whereas with stream-per-track you don't really have that option.

suhasHere commented 1 month ago

I feel making cache duration as timeline based delivery deadline is somewhat confusing. Should I, as publisher , need a given object beyond a certain point in time to be stored in my relay network, in order to meet my application requirements , is one way of thinking about it. This makes is independent of transport mapping or mechanisms.

fluffy commented 1 month ago

I guess the general observation is that the delivery deadline mechanism should be probably mapping-specific? If you're doing stream-per-group/object, you'd want to be resetting streams, whereas with stream-per-track you don't really have that option.

I think once we get the basic mechanism down, then we need to look at if / how interacts with stream reset