Closed jyavenard closed 7 months ago
If there is no objection, @jyavenard and I would like to submit a Pull Request to the spec outlining the details of how it might work to solicit further feedback from working group members. We are also happy to provide a test suite in the form of Web Platform Tests.
We have a reference implementation in WebKit if folks want to try it out.
Given positive reactions to @jyavenard's post above, I think a PR would be welcome. I'd like to arrange to talk through the proposal in an upcoming Media WG meeting, if you'd be happy to?
We're also going to need someone in the WG as a new co-editor.
Thanks for working on this! The buffering events look great. I do have some minor concerns, but no blocking objections:
quality
and onqualitychanged
are likely to face privacy concerns.ManagedMediaSource.fetch()
if we need the UA to have perfect knowledge and the ability to schedule optimally for radio.I can't wait to see the outcome of this new proposal. I think what's being proposed here is really what MSE is lacking most today, so I'm all for it.
Closely related, but a little off topic. Is this API currently deployed on Webkit, or is it just an experiment for now? The MSE API has been disabled on the iPhone for a few years, so I was wondering if the arrival of this new proposal, as well as the arrival of Managed Source Extension
on iOS 17 (according to the Safari 17 Beta release notes), would result in the API being re-enabled on the iPhone.
I can't wait to see the outcome of this new proposal. I think what's being proposed here is really what MSE is lacking most today, so I'm all for it.
that's great to hear !
Managed Media Source is enabled on Safari 17 beta on macOS and iPadOS and behind an experimental flag on iOS for now.
There will be a talk for WWDC on developer.apple.com on Thursday June 8th "Explore media formats for the web" where it is presented.
Anticipating that there may be a period when there are browsers or platforms that only support MediaSource
and browsers or platforms that only support ManagedMediaSource
, would it be possible to clarify the story for developers who want to use MSE across devices?
For instance, is the pull request that proposes to add support for ManagedMediaSource
in HLS.js (https://github.com/video-dev/hls.js/pull/5542) representative of the amount of code needed to adjust applications to leverage ManagedMediaSource
given an existing MediaSource
pipeline, or would it be likely that such applications need to maintain separate MediaSource
and ManagedMediaSource
pipelines, with different adaptive logic?
I'm also wondering about the overall message down the road. Is it "This strikes a better balance than the previous version of MSE, use ManagedMediaSource
whenever possible, fall back to MediaSource
only when it is not supported". Or more "ManagedMediaSource
and MediaSource
have different usage scenarios". If the latter, I think it would be useful to provide guidance in the spec on how to choose between options.
The message we want to convey is to use Managed Media Source first if available, and only fallback to MSE if that's the only option available. The events are hints, you can follow them or not. On iPhone and iPad, if you follow the guidance, you will have access to 5G connectivity. So you could use Managed Media Source juste like MSE. If you don't follow the guidance and a particular user agent decide to enforce that by throwing when you attempt to append data, the remedial code would be applicable with MSE too.
Any logic to decide which resolution variant is suitable to use would be common between the two as far as bandwidth management is concerned. As mention by @dalecurtis the quality
attribute may not fly for fingerprinting concerns.
Improved exposure of and flexibility for memory constraints would be great.
Regarding browser optimization of network request timing, isn't this a general concern rather than one that is specific to streaming ? The key property of streaming network requests is that they are (often) not urgent, because we have lots of buffered data, and so we're happy to trade some latency for improved overall throughput or some other benefit such as battery life. The propose start / stop streaming events don't prevent the site from downloading, only from appending. And it could presumably happen that a network request issued during a "streaming allowed" period doesn't complete within that period, but we should still be allowed to append it.
An alternative, admittedly more radical, approach would be a way to tag network requests as "non-urgent" and so eligible for delay within the browser until a time where they can be made more efficiently. Of course the browser should only delay such requests if there is some pay-off that the site could later observe.
Regarding the quality hints, I think that as with the network request timing there needs to be some measurable benefit to the site to start listening to these and I am not sure what that is ?
Does it need to be a new class ? Or could these just be discoverable extensions to the existing MediaSource ?
(Implementor hat on)
@mwatson2 said:
The propose start / stop streaming events don't prevent the site from downloading, only from appending.
Correct, they don't prevent the site from downloading. The current language allows a UA to prevent appending, but in our implementation experience, that wasn't actually necessary. We left the ability to block appends in the proposal should a UA decide it was necessary or desirable to implement, but that could be pulled out into a separate proposal and removed from this one.
An alternative, admittedly more radical, approach would be a way to tag network requests as "non-urgent" and so eligible for delay within the browser until a time where they can be made more efficiently.
Seems like a good idea to do whether or not we do ManagedMediaSource, and also something very outside the Media WG's bailiwick. Doing this without wreaking havoc on site's bandwidth estimation would also be difficult. The current proposal allows UAs to incorporate buffer water-levels in its decision to fire startstreaming
/endstreaming
, and that would be much, much more difficult to do with a free-floating fetch
request marked as "low-priority".
Does it need to be a new class ? Or could these just be discoverable extensions to the existing MediaSource ?
We discussed this with @wolenetz, and the alternative would be to pass a dictionary containing configuration modes into the MediaSource
constructor. What we all discovered is, that kind of mode switch is not easily feature detectable; a separate class that extends MediaSource
is very feature detectable and enables the same capabilities as a configuration mode.
Improved exposure of and flexibility for memory constraints would be great.
You mean in addition to this proposition?
The difference over MSE is that the coded frame eviction algorithm
can be run at any time and not just during the Prepare Append
step.
When it has run, bufferedchange
event will be fired, along the TimeRanges that were evicted.
I would say that it is up to the user agent to evict content in such a way that it doesn't prevent the current media to continue playing. This could be suggested in the final spec. propose two different iteration of the coded frame eviction: one that would only evict content no longer necessary for playback to continue as-is: such as past data or future, discontinuous from the currently playing TimeRange.
And one that could make the playback stall under extreme memory pressure.
The propose start / stop streaming events don't prevent the site from downloading, only from appending. And it could presumably happen that a network request issued during a "streaming allowed" period doesn't complete within that period, but we should still be allowed to append it.
I will add to Jer's answers:
webkit implementation doesn't block append, what it does however, when the streaming
attribute is true (that is between startstreaming
and endstreaming
events), is tag all downloads started by the page where a ManagedMediaSource is opened as "media", so that they can go over the 5G network (this is also dependent to user system settings). Outside this period, the cellular modem may go into low-power mode, disabling 5G.
An alternative, admittedly more radical, approach would be a way to tag network requests as "non-urgent" and so eligible for delay within the browser until a time where they can be made more efficiently.
When I read the comment from @mwatson2 above I vaguely remembered that such a mechanism already exists. There is a priority
property that can be set when using fetch()
. It seems to be only available in Chrome by now. https://web.dev/fetch-priority/#lower-the-priority-for-non-critical-data-fetches
Sorry if that was obvious to you all.
@jernoble As a site implementor I would be very concerned about the user agent making assumptions about "buffer level". The buffer state consists of both media that is appended and media that has been downloaded and not appended. When considering non-trivial playback scenarios (anything where the download is more complex than a straightforward linear sequence of media blocks) the site is managing what is downloaded and what is appended. For example, sometimes we append media "just-in-time" for playback in which case the UA has no useful information about the true buffer level.
Looked at another way, if the UA starts treating media differently based on the UA's perception of buffer level, sites are just going to optimize when they append to get to the site's idea of optimum performance.
A better concept is the "urgency" of requests i.e. some information about the actual earliest time the response might be needed.
@jyavenard About the memory constraints, I meant this proposal. However, ideally, if there are memory constraints it would be nice for the site to be able to know about them in advance so we can consider than in our adaptive streaming choices. On very constrained devices we sometimes stream at a lower bitrate even when throughput is high so as to be able to store enough media to cover adaptations in future. Of course, a site can heuristically work out what the constraint is by observing the UA behavior when it comes to removals.
From your description, it sounds like what the startstreaming
and endstreaming
events are really doing is advertising the availability of a more efficient network connection which is only intermittently available. Perhaps those events should not be tied to streaming and should just be global events that do exactly that (e.g. networkavailability
event with values like standard
, lowpower
, highspeed
- only better names 'cos I just made those up). And then sites would take advantage of those by deferring non-urgent requests into the more efficient time periods.
@chrisguttandin Yes, it certainly seems like a UA could defer "low" fetchpriority
requests to the time period within which the 5G connection is available.
@mwatson2 said:
Yes, it certainly seems like a UA could defer "low" fetchpriority requests to the time period within which the 5G connection is available.
No, the fetch spec does not allow that currently. The priority is only used to prioritize fetch requests relative to each other. It doesn't appear to allow UAs to delay fetches indefinitely if conditions are not "ripe" for a low priority request. The notes from Chrome explicitly state that fetchpriority
is only really useful in situations where limited bandwidth is being contended for by multiple requests. IOW, if the UA started arbitrarily delaying fetches marked as "low", that would likely make a lot of sites upset. This would need a new "super-low" or "super-duper-low" fetch priority to be specified.
As a site implementor I would be very concerned about the user agent making assumptions about "buffer level".
The UA has to make those assumptions in order to implement things like readyState
. There's no avoiding it. I would counter that:
For example, sometimes we append media "just-in-time" for playback in which case the UA has no useful information about the true buffer level.
Seems that by not appending data that has been downloaded and the site intends to play, the resulting problem is one of the sites' own making. One that is easily avoided by just appending that downloaded data rather than saving it for a "just in time" append when the buffered level becomes critically low.
Looked at another way, if the UA starts treating media differently based on the UA's perception of buffer level, sites are just going to optimize when they append to get to the site's idea of optimum performance.
Yes, that is literally the point. :)
A site that can "lie" to the UA by fetching a ton of data up front, and appending that previously downloaded data whenever they receive startstreaming
and until they receive stopstreaming
will have fantastic power characteristics, as they will leave the radios quiet for the longest period of time.
In the end, startstreaming
and stopstreaming
events are hints, and sites which fetch and append the way the UA expects between those events may see benefits, including lower power cost and greater download speed.
@jernoble wrote:
Seems that by not appending data that has been downloaded and the site intends to play, the resulting problem is one of the sites' own making.
Yep, but I don't think anyone is going to hold back on appending data that they 100% intend to play. The use-case for "just-in-time" appending is when you are not sure until that time what media is to be played. I appreciate that an alternative is to append anyway and then replace if you change your mind, but this has its own complexities. The point is that MSE provides sites with the flexibility to compose media streams in whatever way they choose in a manner that is decoupled from the download strategy. This is very useful. UA assumptions about network requests made based on what has been appended are likely to be incorrect.
I'm assuming that a possible UA algorithm would be to turn on the expensive radio when buffer levels get low and to switch to a longer on / off duty cycle the higher the buffer level. A site that wanted to game this would hold back appends to gain access to the expensive fast radio more often, optimizing for their own goal (throughput) but defeating the UA's objective to save battery life. A more enlightened site developer might share your desire to preserve battery, but in that case wouldn't it be better to give control of the duty cycle to the site, which knows more about its data needs ?
I appreciate that an alternative is to append anyway and then replace if you change your mind, but this has its own complexities.
Yes, that is the preferred pattern of use. It doesn't seem reasonable to design and implement a complicated network API because of "complexities" in how overlapping appends work. We should just address those complexities directly!
wouldn't it be better to give control of the duty cycle to the site, which knows more about its data needs ?
No, because the site isn't the only application on the system driving the radio, nor is the ability for a website to control the duty cycle of an expensive modem a desirable thing (IMO) for the web platform.
@dalecurtis
You could imagine something like ManagedMediaSource.fetch() if we need the UA to have perfect knowledge and the ability to schedule optimally for radio.
We considered this, but the risk here is ManagedMediaSource.fetch()
becoming a "magic incantation" in the web platform for "download this faster". (Much in the same way as transform: rotateZ(0deg)
became a "magic incantation" for "make this div layer-backed".)
ManagedMediaSource.prototype.fetch()
may allow the UA to know with relative certainty that a given network request was a media request, but it's not a guarantee. And it doesn't actually solve the problem of indicating when that fetch request should be issued. If ManagedMediaSource.prototype.fetch()
allowed the UA to delay issuing the request until "the best time for networking" or when "buffer levels cross a low-water threshold" then it's functionally the same as the startstreaming
event, at the cost of a much more complicated specification and implementation.
We considered this, but the risk here is
ManagedMediaSource.fetch()
becoming a "magic incantation" in the web platform for "download this faster". (Much in the same way astransform: rotateZ(0deg)
became a "magic incantation" for "make this div layer-backed".)
I agree with this. However the risks here are equivalent to the streaming event approach if we require some percentage of fetched bytes to be appended. I.e., with either solution a page could use canned data to simulate the buffering levels required to get 5G if they really wanted to.
ManagedMediaSource.prototype.fetch()
may allow the UA to know with relative certainty that a given network request was a media request, but it's not a guarantee.
The risks here also seem the same.
If
ManagedMediaSource.prototype.fetch()
allowed the UA to delay issuing the request until "the best time for networking" or when "buffer levels cross a low-water threshold" then it's functionally the same as thestartstreaming
event, at the cost of a much more complicated specification and implementation.
Yes, I was expecting the fetch version to delay in the same way, but with added benefit of knowing the download rate so that fetches can be scheduled with transfer time in mind. I don't quite follow how you expect to be able to deliver startstreaming
reliably without that knowledge -- which is critical if we expect the UA to be a trusted advisor in this context and recommend developers prefer ManagedMediaSource
over MediaSource
. Can you elaborate on how that's expected to work?
@jernoble My point is that - at least - if you embed assumptions into the design - like the assumption that the site is downloading a single simple linear media sequence and will append media as soon as it is downloaded - you'd better be explicit about that assumption. So then sites that do not conform to that can avoid using the new API.
But I'd prefer a solution that did not embed such an assumption because that is clearly just one specific use case - albeit a common one.
The fundamental problem here is one where you want to schedule downloads to take advantage of a resource that is slow and/or expensive to enable, use and disable. As a result, we get the best results when the resource is intermittently available and fully utilized when it is available (i.e. we want to aggregate the idle times, compared to current download scheduling). This problem has very little to do with streaming media, except that media is one example where the application is (sometimes) robust to downloads being scheduled this way.
Ideally, the site would simply be able to provide each download with a wallclock deadline. This would give the UA perfect knowledge of when each request was required and it could schedule in the most efficient way.
What's proposed is that during the streaming
period the UA will assume a deadline for all network requests that is based on the state of the MediaSourceBuffers
. It seems like a big assumption that could have unintended consequences.
But I'd prefer a solution that did not embed such an assumption because that is clearly just one specific use case - albeit a common one.
To be fair, this use case is the overwhelmingly most common one. Linear playback by appending chunks as soon as they are received is far and away the most common mode of operation. The solution proposed is incredibly simple, easy to specify, implement, and use for the most common use case of the API.
We can debate whether a more complicated networking coalescing API (defined outside of this specification and by a completely separate working group) would help solve the remaining (and much, much less common) use cases, but I don't believe that should prevent this proposal from moving forward.
Meanwhile, those use cases that don't fit neatly into the "fetch, append, throw away" mode above can... just continue on exactly as they have been with MediaSource
. And if the Fetch API is modified to allow coalescing low-priority fetch requests with specific deadlines for each, those requests will work with this API as well.
@dalecurtis said:
However the risks here are equivalent to the streaming event approach if we require some percentage of fetched bytes to be appended. I.e., with either solution a page could use canned data to simulate the buffering levels required to get 5G if they really wanted to.
True. The UA has a great deal of latitude about both when to fire the startstreaming
and stopstreaming
events, and what it does between them. If that kind of "abuse" of the API was detected (and it does seem like it would be detectable), that latitude would allow a UA to act in a way to protect the user.
Yes, I was expecting the fetch version to delay in the same way, but with added benefit of knowing the download rate so that fetches can be scheduled with transfer time in mind. I don't quite follow how you expect to be able to deliver
startstreaming
reliably without that knowledge -- which is critical if we expect the UA to be a trusted advisor in this context and recommend developers preferManagedMediaSource
overMediaSource
. Can you elaborate on how that's expected to work?
In our implementation, the times at which the startstreaming
event is fired are generous enough that there's no risk of responses coming so late that they lead to a buffer underrun. And because standard fetch()
semantics apply, sites can use their existing download rate detection to determine things like variant selection. The only use case & behavior which may be negatively affected are things like live streaming, where sites will want to stay as close to the live edge as possible. In that case, they'll just ignore the *streaming
events entirely, and those "negative" effects may simply mean the radio will switch to a slower-but-more-power-efficient mode. In our implementation, that "negative" scenario is exactly the same as the current status quo.
Other UAs may make different (and more advanced!) decisions about when to schedule those events. A hypothetical browser may make note of the speed at which fetches made between the streaming events take place, and allow the buffer levels to more completely empty before triggering startstreaming
. Or conversely, notice that those same loads are occurring so slowly that they don't benefit from the higher-cost-but-faster radio, and revert to a less expensive network.
However, a non-goal of our implementation is to facilitate pages staying as close to the edge of the buffered range as possible, delivering data from the network "just in time" to avoid underruns. So the requirement to closely monitor and predict download speeds simply isn't present.
@mwatson2 said:
So then sites that do not conform to that can avoid using the new API.
I'm curious about this phrasing. Do you mean "avoid using ManagedMediaSource" entirely? Or just avoid listening to the startstreaming
and stopstreaming
events?
The design of this ManagedMediaSource
proposal is such that, if you ignore the streaming events entirely, the behavior will be essentially the same as MediaSource
. Clients are free to ignore the streaming events, and for some cases like live video, they must.
I'm not saying the proposal shouldn't move forward. I was trying to see if there was any scope for something more flexible. I do think that the assumptions should be made explicit.
Do you mean "avoid using ManagedMediaSource" entirely? Or just avoid listening to the startstreaming and stopstreaming events?
Unless I'm mis-understanding, if you use ManagedMediaSource and then ignore the startstreaming and stopstreaming events then throughput measurements are going to be a bit messed up because the site's requests will randomly fall into the 5G streaming windows or not. Or is the radio on / off behavior essentially the same with MediaSource and the difference is just whether you tell the site about it or not ?
In the former case, it would be good if the BufferChanged
event (specifically) could just be added to the existing MediaSourceBuffer
- I think that would be backwards-compatible, no ?
@jernoble said:
However, a non-goal of our implementation is to facilitate pages staying as close to the edge of the buffered range as possible, delivering data from the network "just in time" to avoid underruns. So the requirement to closely monitor and predict download speeds simply isn't present.
Thanks that explains a lot. I can see how this system works well enough for VOD playbacks.
As you note, a live stream might have to ignore the streaming events to maintain buffering. That seems in conflict with the language around the UA being able to block appends. How do you see that being reconciled? Is stopstreaming never fired since the forward buffering level remains too low?
The text around how the streaming events are to be used will need some care to ensure developers are aware that the streaming events can't function as the sole buffering mechanism during live streaming. It'll be a bit surprising to first time authors I expect; but after ten years, there aren't many non-library based players so maybe that's no big deal.
@jernoble said:
However, a non-goal of our implementation is to facilitate pages staying as close to the edge of the buffered range as possible, delivering data from the network "just in time" to avoid underruns. So the requirement to closely monitor and predict download speeds simply isn't present.
Thanks that explains a lot. I can see how this system works well enough for VOD playbacks.
As you note, a live stream might have to ignore the streaming events to maintain buffering. That seems in conflict with the language around the UA being able to block appends. How do you see that being reconciled?
As I mentioned upthread, we found during implementation that blocking appends was unnecessary, and I suggested that we remove that language from the proposal and track it in another issue.
Something @jyavenard and I thought about earlier was an explicit signal to the UA that the client would be doing live streaming; something like ManagedMediaSource.prototype.streaming = true
. If this flag was set, the UA would assume the client would always try to stay as close to the live edge as possible and would pick the best network for frequent small requests. But we found in practice that this wasn't really necessary; detecting live-streaming behavior was sufficient.
I'm not saying the proposal shouldn't move forward. I was trying to see if there was any scope for something more flexible. I do think that the assumptions should be made explicit.
Ah, I understand now, thanks.
Do you mean "avoid using ManagedMediaSource" entirely? Or just avoid listening to the startstreaming and stopstreaming events?
Unless I'm mis-understanding, if you use ManagedMediaSource and then ignore the startstreaming and stopstreaming events then throughput measurements are going to be a bit messed up because the site's requests will randomly fall into the 5G streaming windows or not. Or is the radio on / off behavior essentially the same with MediaSource and the difference is just whether you tell the site about it or not ?
This will vary from platform to platform, but on platforms with multiple radios, it's certainly possible for network speeds to vary wildly as data is routed on one or the other with different capabilities. And of course on mobile devices, users can travel in and out of coverage with varying levels of signal quality and capabilities. In my own neighborhood, I noticed that available bandwidth dropped off a cliff sometimes even when not moving, presumably as other people around me all tried to use the network simultaneously.
So yes, it could cause bandwidth measurements to change as radios were activated and deactivated. But I don't believe this is a new problem, nor one that sites are unprepared to deal with.
In the former case, it would be good if the
BufferChanged
event (specifically) could just be added to the existingMediaSourceBuffer
- I think that would be backwards-compatible, no ?
The proposed bufferchanged
event is fired when the UA purges data from SourceBuffers, either from an explicit request by the client or when a low-memory event forces the UA to evict appended data. Maybe I'm mis-understanding in turn, but how would that tie in with bandwidth estimation?
@jernoble What I was asking was whether the site's choice to use ManagedMediaSource or plain MediaSource could affect the platform's decisions about radio usage. I think you are saying that even if it did, the site would not notice any difference because throughput varies so widely anyway for other reasons. i.e. any ManagedMediaSource-driven changes in platform behavior won't disadvantage a site that chooses ManagedMediaSource and ignores the events, compared to a site that chooses MediaSource.
If that is the case then my point about BufferChanged is moot.
If, on the other hand, the site would be better off choosing MediaSource then it would be nice if that didn't also mean giving up BufferChanged events. BufferChanged and the streaming events seem like independent capabilities and the former can be added to MediaSource in a backwards-compatible way.
And then, if it is the case that the site is not disadvantaged by ignoring the streaming events (compared to using MediaSource), why can't the events just be added to MediaSource and fired always ? You mentioned the need to add configuration parameters to the constructor, but if it doesn't matter whether the site wants the events or not, why the need for configuration ?
[And, of course, I do understand you would say that the best option for the site would be to choose ManagedMediaSource and not ignore the streaming events, but I'm trying to understand all the options.]
Seems that by not appending data that has been downloaded and the site intends to play, the resulting problem is one of the sites' own making.
From implementer point of view, UA restrictions is what causes "just-in-time" approach.
SourceBuffer
limits are stricter than general memory limits in most browsers. They are completely unpredictable too, some seem to be based on playback duration, others on data size, device capabilities and current state probably affect them too. The specification is intentionally vague on the subject.
The effect is that you can never known where the limit is, you just waste resources downloading and bang your head into append()
to see if you get QuotaExceededError
or not over and over again.
Whereas keeping your own buffer you can maintain it at comfortable level and forget about this exception-based madness.
SourceBuffer
is also a black box (or rather a black hole). Once data goes in you have no idea what's inside. After switching qualities and overlaying some segment atop others it's really hard to keep track of what's in the buffer at any given position and what's going to be played after seeking.
Maintaining your own buffer again helps you keep track of different qualities and know precisely what is buffered at what time and even keep several tracks of different qualities simultaneously.
There's also a case of balancing smoothness and playback start delay. By reading streamed data I may be able to quickly append first chunk that gets downloaded and draw a frame much faster than full segment download is finished. However, I'm risking encountering a stall if next chunk gets throttled, so instead I may prefer to keep my own intermediate buffer between fetch's ReadableStream
and SourceBuffer
. This is already tricky and introducing network limiting based on SourceBuffer
state may lead to weird behaviour.
For instance, what's going to happen if I download half a segment, feed it to SourceBuffer and then UA would decide buffer is large enough and it can shut down the powerful modem and fire off onendstreaming
? Both aborting a request and continuing its download seem counter-productive.
Finally, SourceBuffer
operates in seconds while network operates in bytes and media containers operate in integer timescale units. Converting from one another for precise buffer management is just impossible.
All in all, I think playback buffer should ideally be completely decoupled from data buffer. Perhaps, there could be a general-purpose abstract buffer that website developers can maintain as they please and ample APIs from UAs that guide how to maintain it (memory level, network conditions, battery state... stuff we'll probably never get out of privacy concerns) along with clear restrictions. After all, it's not only video that needs large data and perhaps those would be better off in Navigator.connection
.
The part of this proposal that incites UAs to throttle or speed up network based on video buffer is going into opposite direction.
Question: will the managed media source API work well with newer network technologies like WebRTC data channel (SCTP/DTLS/UDP) or WebTransport (HTTP/3 over QUIC)? Is there something that those technologies need to do in order to work better (aside from worker support, which is there).
filed standards position for with our friends at Mozilla: https://github.com/mozilla/standards-positions/issues/845
Sorry for the delay, and thanks for the (unfortunately numerous) pings here and elsewhere, here's Mozilla's (largely positive) answer, broken down by topics like in @jya's initial post.
"bufferedchange"
event -- not much to say, this is clearly very useful in all scenarios, even on powerful machines, e.g. on dormant browser tabs. I'm a bit concerned about web compat if this can be implemented very loosely (e.g. if the removal policies can be arbitrary), but not terribly concerned. As mentioned previously, what's the rationale for only adding this to the new interface? While it's certainly possible to compute the range changes manually, maybe it's possible to decouple this from ManagedMediaSource
, since this is a new event.QuotaExceededError
, why not, has this been useful in practice? I haven't seen it discussed in thread"onstartstreaming"
/ "onendstreaming"
events, the names make me intuitively think that they follow a state change about streaming status, not that they encourage the application to start fetching. If we're talking about taking actions, start is better paired with stop, not end, end being final. However, we're happy about the underlying concept.MediaCapabilities
). Is this in use in practice? Should we add more events to the platform, e.g. media capabilities change (because the battery is now under a certain threshold and the device is throttling, or because some kind of "stamina mode" has been enabled, to talk about something concrete my phone offers), events about the fact that the device is dropping a lot of frames?Navigator.connection
is not universally implemented and has problems (some analysis). We just want to know if the connection is metered here I guess, the network speed is discoverable by other means already, the playback quality as well.I also see in the interface proposed that isTypeSupported(...)
is present, but not mentioned or discussed, I assume this was not meant to be included, or is there something that is being considered?
All in all, we'll happily follow future developments, be it more discussion or directly reviewing a PR to the standard. I'll be updating our standards-position repo, so this is clear.
Chiming in after being dormant since leaving Chrome earlier this year:
isTypeSupported()
on the proposed IDL for ManagedMediaSource
seems to be unnecessary, since it already exists in the supertype, MediaSource
. However, see below...ManagedMediaSource
was due to:
SourceBuffer
s at any time is not backwards-compatible: existing sites using MediaSource
are not expecting such eviction, except deterministically as part of the initial steps the UA takes to service an appendBuffer()
call. Therefore, having a distinct way of both discovering and instantiating a ManagedMediaSource
would prevent surprises to those existing MediaSource
API users.ManagedMediaSource
API (or alternatively, only the MediaSource
API). Others may allow both or neither.For a UA that only offers ManagedMediaSource
use to apps, then it makes sense that MediaSource
not be constructable or visible to those apps in that UA since it is very common for current apps to detect MSE by the presence of MediaSource
.
ManagedMediaSource
offers a redundant isTypeSupported()
?MediaSource
and SourceBuffer
when the UA only wants to expose ManagedMediaSource
and ManagedSourceBuffer
. Is this possible in web IDL?Finally, while I may be slow in responding, I'll still be around for queries about MSE. I support the WG adding new MSE co-editors and am quite happy to see work on MSE continue.
Discussion from the 27 July MEIG meeting: https://www.w3.org/2023/07/25-me-minutes.html. The key points raised in that meeting were:
As discussed during WG monthly call, remove the ability to throw when calling appendBuffer if streaming
attribute is false (and just noticed that the streaming
attribute wasn't in the IDL definition).
The quality
attributes will be moved into its own issue so that it can be discussed separately.
What happens in the case of a user-initiated seek beyond the existing buffered ranges , when the action occurs outside of the onstartstreaming
/ onendstreaming
event window? A seek is precisely the time at which optimized downloading would be desired.
What happens in the case of a user-initiated seek beyond the existing buffered ranges , when the action occurs outside of the
onstartstreaming
/onendstreaming
event window? A seek is precisely the time at which optimized downloading would be desired.
startstreaming
will be fired if the user agent has determined that not enough data is buffered for playback to continue uninterrupted. This includes when you seek.
So in this case, startstreaming
is fired if you seek in a non-buffered section, and endstreaming
will be fired again once the UA has determined that enough data has been buffered
Managed Media Source is enabled on Safari 17 beta on macOS and iPadOS and behind an experimental flag on iOS for now.
There is also a flag that says 'Managed Media Source requires AirPlay Source" which is checked. May I ask the purpose of that dependency? I could not find mentions of Airplay in this proposal anywhere. But if I overlooked it, please guide me.
Besides striking a jarring note to our developers, it makes it harder to explain to our users.
Managed Media Source is enabled on Safari 17 beta on macOS and iPadOS and behind an experimental flag on iOS for now.
There is also a flag that says 'Managed Media Source requires AirPlay Source" which is checked. May I ask the purpose of that dependency? I could not find mentions of Airplay in this proposal anywhere. But if I overlooked it, please guide me.
Besides striking a jarring note to our developers, it makes it harder to explain to our users.
It is not part of this proposal. I explained the reason behind it in https://developer.apple.com/videos/play/wwdc2023/10122
I don't see how this has any negative effects on your users that requires an explanation, quite the opposite. In order to avoid usability regression (where all videos would have the ability to be used with AirPlay), when using ManagedMediaSource on iPhone, you need to provide an alternative video source that is compatible with remote media playback.
I tried to use ManagedMediaSource API on iOS 17 in the simulator. Futured flag is enabled for ManagedMediaSource API. When I call addSourceBuffer() method with any mime typeit just says it is not SUpported and can not create source buffer. And when I try to use the method isTypeSupported() it gives false every time. I tried with different combinations of mime types with and without codecs and it always return false. The only mime type that return 'true' is "video/webm" without any codecs. So my question is which codecs amd mime types iOS support and do others have the same issue - not able to use the Managed Media Source on iOS. Or maybe I do something wrong when i use addSourceBuffer() and isTypeSUpported() methods.
You need to use iOS 17.1 beta 2; but this isn't the place to ask those questions, please use bugs.webkit.org thank you
jyavenard, thank you for the prompt reply.
It is not part of this proposal. I explained the reason behind it in https://developer.apple.com/videos/play/wwdc2023/10122
Thanks, I viewed that video twice. There is no mention of the flag 'Managed Media Source requires AirPlay Source'. Perhaps you could convey the explanation here?
Managed Media Source is enabled on Safari 17 beta on macOS and iPadOS and behind an experimental flag on iOS for now.
This proposal opened with a presentation of the new feature, an invitation to test a reference implementation, and a pointer to the Feature Flag. The 'Managed Media Source requires Airplay Source' flag was not mentioned but is absolutely required to test the reference implementation. Probably good to fill in that part too.
I don't see how this has any negative effects on your users that requires an explanation
Respectfully, it does. As soon as we mention Airplay, users have a dozen questions which we cannot answer. "Is this turning off AirPlay?" "Do I need to use Airplay?" "Is your product using Airplay?" This is not hypothetical, we have encountered it.
I believe that AirPlay is a branded Apple service, is it not? We wish to show our users that we are delivering something that uses standards and is independent of any proprietary vendor product/service.
In order to avoid usability regression (where all videos would have the ability to be used with AirPlay)
I am suddenly feeling concerned. If we ask a user to test Managed Media Source using the reference implementation introduced here, are we changing the behavior of their iPhone vis-a-vis Airplay?
Thank you,
This is not particularly relevant but we are streaming audio, not video.
Thanks, I viewed that video twice. There is no mention of the flag 'Managed Media Source requires AirPlay Source'. Perhaps you could convey the explanation here?
There's definitely is, you should watch again at the 20 minutes mark, or from the transcript
When designing Managed MSE, we wanted to make sure that nothing was left out by accident and that users continue to get the same level of features as they did in the past. So to activate Managed MSE on Mac, iPad, and iPhone, your player must provide an AirPlay source alternative. You can still have access to Managed MSE without it, but you must explicitly disable AirPlay by calling disableRemotePlayback on your media element from the Remote Playback API
Currently, on iPhone, only plain mp4 or HLS is supported, those inherently works with AirPlay (Apple's version of the spec's remote playback). And AirPlay is a very popular (and used) feature. Most A/V receivers support it these days MSE however, can't work with AirPlay, it has no reference source of content.
We didn't want a functionality to become overnight broken once the ManagedMediaSource element became available. So you need to provide an alternative playback source, or explicitly disable remote playback.
We hope that the solution adopted will be to do the former (using a source that is likely existing as there's need to be support for earlier iOS version), so that users can continue to listen or view on their preferred A/V equipment.
I am suddenly feeling concerned. If we ask a user to test Managed Media Source using the reference implementation introduced here, are we changing the behavior of their iPhone vis-a-vis Airplay?
So in all honesty, I believe your concerns are unwarranted. In the worse case, you need to set a single attribute to your audio element for things to work as you expected.
And again, this has nothing to do with this proposal, so this will be my last answer on this topic here.
jyavenard, thank you for the patient, informative response. Sorry for hijacking this channel for a moment. We will get back to the lab and finish building it into our software.
We are delighted to have this proposal, BTW. More than you can imagine.
I've wrote a first draft here https://jyavenard.github.io/media-source/media-source-respec.html https://github.com/jyavenard/media-source/tree/managed_mse
@jyavenard - nice work. It might be useful to also extend the Examples section of https://github.com/jyavenard/media-source/tree/managed_mse to include an example showing the use of ManagedMediaSource and the onstartstreaming and onendstreaming events.
@jyavenard - nice work. It might be useful to also extend the Examples section of https://github.com/jyavenard/media-source/tree/managed_mse to include an example showing the use of ManagedMediaSource and the onstartstreaming and onendstreaming events.
Done
Amended the proposal to change the BufferedChangeEvent to no longer make optional the two TimeRanges.
Hey folks, the PR is now up: https://github.com/w3c/media-source/pull/329
Thanks again to everyone that's provided feedback and helped shape the overall design.
Would love to get a second (or third!) implementer commenting on the PR before landing it - as well as potential developers. We intend to prepare some tests in parallel, so we'd love some feedback on the overall design in the meantime.
Definitions
A “managed” MediaSource is one where more control over the MediaSource and its associated objects has been given over to the User Agent.
Introduction
The explicit goal of the Media Source Extensions specification is to transfer more control over the streaming of media data from the User Agent to the application running in the page. This transfer of control can and has added points of inefficiencies, where the page does not have the same level of capabilities, knowledge, or even goals as the User Agent. Examples of these inefficiencies include the management of buffer levels, the timing and amount of network access, and media variant selection. These inefficiencies have largely been immaterial on relatively powerful devices like modern general purpose computers. However, on devices with narrower capabilities, it can be difficult to achieve the same quality of playback with the MediaSource API as is possible with native playback paths provided by the User Agent.
The goal of the ManagedMediaSource API is to transfer some control back to the User Agent from the application for the purpose of increasing playback efficiency and performance, while retaining the ability for pages to control streaming of media data.
Goals
Non-Goals
Scenario
Low-memory availability
A user loads a MSE-based website on a device with a limited amount of physical memory, and no ability to swap. The user plays some content on that website, and pauses that content. The system subsequently faces memory pressure, and requires applications (including the User Agent) to purge unused memory. If not enough memory is made available, those applications (including the User Agent) may be killed by the system in order to free up enough memory to perform the operation triggering this memory pressure.
The User Agent runs a version of the “Coded Frame Eviction” algorithm, removing ranges of buffered data in order to free memory for use by the system. At the end of this algorithm, the User Agent fires a “bufferedchange” event at every SourceBuffer affected by this algorithm, allowing the web application to be notified that it may need to re-request purged media data from the server before beginning playback.
Memory availability notification
When a call to appendBuffer is rejected with a QuotaExceededError exception, it can indicate the amount of excessive bytes or time that caused the error.
Network Streaming changes
Currently, a web application is allowed to append media data into a Source Buffer at any time, up until that Source Buffer’s “buffer full flag” is set, indicating no additional data is allowed to be appended. However, a constrained device may want to coalesce network use into a small window, and allow the network to query for battery and bandwidth reasons.
Alternatively, a device may have access to a high-speed network with high power use while the relevant communications interface is active (as can happen on 5G cellular). Using such a network may be beneficial in some circumstances:
To get these benefits without excessive battery drain, it's necessary to buffer more at once, and to limit streaming activity to specific windows so that the device's radio can be cycled on and off.
The User Agent would fire a “startstreaming” event at the MediaSource, indicating that the web application should begin streaming new media data. It would be up to the User Agent to determine when streaming should start, and could take current buffer levels, current time, network conditions, and other networking activity on the system.
When the User Agent determines that no further media streaming should take place, it would fire a “stopstreaming” event at the MediaSource, indicating to the web application that enough media data had been buffered to allow playback to continue successfully.
Usage example
Privacy considerations
TODO: discuss potential privacy protections if multiple origins try poke at this at the same time. A concern is providing visibility to preferred quality if it is based on networking condition such as cellular or wifi etc.
Other
To consider from MSE v2: