Is exposing https://w3c.github.io/webcodecs/#enumdef-hardwareacceleration a good idea

youennf commented 3 years ago

I am wondering what HardwareAcceleration is supposed to be used for.

One potential use would be to always prefer power efficiency. But power efficiency does not mandate hardware acceleration. Depending on the device, the codec, the resolution, software based codecs might be better suited. It is unclear how a web developer will be able to select hardwareAcceleration for that case except to let the UA decide with 'allow'.

Another possibility is to maximise compatibility and use 'deny'. In that case though, it means that web developer looses power efficiency in a lot of cases. A careful web developer will then probably want to enter the business of identifying which SW and HW codecs are in use on a device. This does not seem great and somehow contradicts the desire to not increase fingerprinting.

It seems UA is in general the best entity to decide what to use at any given point. Instead of hard requirements, a web application could look at providing hints, though it is true hints tend to be difficult to define and implement consistently.

It also seems HardwareAcceleration is a potential fingerprinting vector though it is not marked as so in the spec.

chcunningham commented 3 years ago

I am wondering what HardwareAcceleration is supposed to be used for.

This is a feature request from several sophisticated apps that bring their own encoders/decoders implemented in WASM (with customized features and tuning). Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

It is unclear how a web developer will be able to select hardwareAcceleration for that case except to let the UA decide with 'allow'.

What part is unclear? allow works as you've described, and is the default value.

This does not seem great and somehow contradicts the desire to not increase fingerprinting.

Fingerprinting and codec features are often add odds. In such cases, including this one, we strive to offer the features folks are demanding without offering any extra information (least power principle). This is why we have reduced the signal to "hardware accelerated", as opposed to exposing the name of the hardware or similar.

It also seems HardwareAcceleration is a potential fingerprinting vector though it is not marked as so in the spec.

Agree, I think that's a good thing for us to call out.

chcunningham commented 3 years ago

Triage note: marking 'breaking', as removal of this attribute would clearly break behavior for folks that have come to rely on it. This is purely a triage note; I am opposed to actually implementing this break, as the feature is directly requested by users with reasons cited above.

Also marking 'editorial' for the requested additions to privacy considerations.

chcunningham commented 3 years ago

@youennf, pls see above. If nothing more to discuss I'd like to close.

youennf commented 3 years ago

Thanks for pinging me. In general, when an API increases fingerprinting, we need a really clear usecase, the bar should be really high. I'd like to understand why media capabilities is not good enough and if this API meets this high bar.

Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

These applications can probably use whether encoding/decoding is powerEfficient through media capabilities for their configuration as a good enough approximation. This makes me thing this field is more of a v2 optional feature than a must-have feature.

Also, this usecase is about requiring hardware, while the API is allowing to require software. Is there a usecase for that other part?

As a side note, this potentially forbids some OS strategies like switching between software/hardware depending on various factors (other apps using HW, battery status...). Or it could force OSes to lie to web applications.

If we really want such an API, I would go with MediaCapabilities to let the application decide whether it wants OS codecs or its own codec. If the web application wants OS codecs, a hint API instead of a must-use API seems more appropriate since it would not cause fingerprinting.

sandersdan commented 3 years ago

These applications can probably use whether encoding/decoding is powerEfficient through media capabilities for their configuration as a good enough approximation.

There are several things that may be assumed about a hardware codec (none of which are guaranteed by WebCodecs):

Power efficient operation
Reduced CPU usage
Output frames are in GPU memory
More likely to be strict, less likely to recover from errors
May be restrictive, bounded, or even in some cases unreliable

My understanding of the use case that Chris outlined above (a media player application) is that the goal is to take the efficiency (first three points above) when it's available, but to fully control the fallback software path for consistent behavior. powerEfficent may be a useful-enough signal for this case.

Also, this usecase is about requiring hardware, while the API is allowing to require software. Is there a usecase for that other part?

Yes, it's also common for applications to use hardware optimistically, but to prefer software if it is determined that hardware does not meed the application's requirements (last two points above). This has been historically difficult for UAs to determine, so much so that there have been proposals to allow WebRTC servers to request disabling of hardware acceleration on a per-connection basis.

This makes me thing this field is more of a v2 optional feature than a must-have feature.

This was one of the first requests ever made by a partner for WebCodecs. It's probably not the most important WebCodecs feature, but I don't consider it trivial either.

As a side note, this potentially forbids some OS strategies like switching between software/hardware depending on various factors (other apps using HW, battery status...). Or it could force OSes to lie to web applications.

This is true. Applications should not be setting a value for this property if they don't want to restrict the implementation.

youennf commented 3 years ago

Yes, it's also common for applications to use hardware optimistically, but to prefer software if it is determined that hardware does not meed the application's requirements (last two points above)

But then, how is the web page knowing that a hardware encoder is good enough/not good enough in terms of compat? It seems we would need to leak the actual HW chipset ID and the actual SW library ID and version for the application to do a good job there, which is something we do not want to do for fingerprinting reasons.

sandersdan commented 3 years ago

But then, how is the web page knowing that a hardware encoder is good enough/not good enough in terms of compat?

Typically by trying hardware first, measuring performance, and monitoring for any errors with their particular content and requirements.

The key thing is to be able to act on that information once they have gathered it.

youennf commented 3 years ago

Typically by trying hardware first, measuring performance, and monitoring for any errors with their particular content and requirements.

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

It seems this is only useful in the case the application wants to do the following:

Try hardware, if good use it. If not good, go to next step.
Try software, if good use it. If not good, go to next step.
Use JS library implementation (needed anyway since some platforms only support HW codecs). This flow seems like an edge case and I do not see the necessity to support it in v1. A hint would work equally well, without the fingerprinting concerns.

sandersdan commented 3 years ago

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't follow, without the field there isn't a way to forcefully fall back. Most applications won't have a WASM fallback.

(needed anyway since some platforms only support HW codecs)

I think many WebRTC-style applications would choose to switch to a reduced quality/feature mode if the WebCodecs codecs were deemed inadequate and there was no alternative available.

youennf commented 3 years ago

To summarise, the main usecase of this property is for applications to force SW code path. For applications wanting to use HW, powerEfficient might be good enough, at least in the short term.

Can you clarify the use case of such applications, in particular those applications that would do monitoring but would not have a fallback? My understanding was that these applications wanted to protect themselves from codec UA/OS bugs. But such apps need to have a fallback in that case, so it is probably something else.

Again, given this is a potential new fingerprinting vector, the bar should be high. It should allow to unlock new features to apps, without known viable alternatives. Do you know what was PING WG assessment of this particular field?

dalecurtis commented 3 years ago

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector. You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

sandersdan commented 3 years ago

To summarise, the main usecase of this property is for applications to force SW code path.

I push back lightly on this characterization, while powerEfficient might be a substitute it doesn't eliminate the use case.

Can you clarify the use case of such applications, in particular those applications that would do monitoring but would not have a fallback?

Sure. Some things applications may be monitoring include:

Throughput
Latency and jitter
Rate control accuracy (encode only)
Picture quality (probably encode only)
Actual failures

These may be things that are inherent to the platform codecs or they may be things that vary depending on system load. WebRTC-style applications are likely to use resolution, bitrate, codec, and profile settings as a first line of defense. In cases where that is inadequate (eg. because jitter is just too high at any setting), forcing software codecs can be a reliable workaround.

In the case of actual failures, the cause may be UA/OS bugs, or it may be non-conformant streams. In either case it is likely that a software codec will be more reliable.

Do you know what was PING WG assessment of this particular field?

I will defer to @chcunningham for this question.

youennf commented 3 years ago

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector.

I think we all agree we want to go to a world where we mitigate-then-remove those issues. We certainly do not want to make fingerprinting easier and more robust.

You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

How do you force a video element to use either the SW or the HW decoder at a given fixed resolution? What about encoders?

youennf commented 3 years ago

Some things applications may be monitoring include:

Throughput

Latency and jitter

Rate control accuracy (encode only)

Picture quality (probably encode only)

For those things, I fail to understand the relationship with the HW acceleration field. Media capabilities give you already that information using MediaDecodingType/MediaEncodingType. Maybe what you want is to pass a MediaDecodingType/MediaEncodingType when creating the encoder/decoder. Then the OS will select its most suitable codec alternative according that value. This answers the problem you are describing in a more straightforward manner without the fingerprinting hurdles.

FWIW, I know OSes that have more than one SW encoder of a given codec. A single boolean is not sufficient to enumerate them all.

sandersdan commented 3 years ago

Media capabilities give you already that information using MediaDecodingType/MediaEncodingType.

It doesn't, nor could it reliably do so. It can guess at a subset, but even for those they vary by system load, configuration, and content.

Then the OS will select its most suitable codec alternative according that value.

This is potentially possible but is delving into trying to guess what applications want. For example Chrome already avoids hardware decode for WebRTC on Windows 7 due to high latency, but we can't really know every application's detailed preferences well enough to implement a generic selection algorithm.

WebCodecs also operates at a low-enough level that things like dynamic codec switching are unlikely to 100% reliable, so the application will need to be involved in the algorithm in some direct way.

FWIW, I know OSes that have more than one SW encoder of a given codec. A single boolean is not sufficient to enumerate them all.

This is true. We didn't see much advantage with full enumeration, and the fingerprinting concerns are much larger with an API like that.

dalecurtis commented 3 years ago

I think we all agree we want to go to a world where we mitigate-then-remove those issues. We certainly do not want to make fingerprinting easier and more robust.

Agreed, but AFAIK, the only mitigation possible is restricting when a hardware codec is used. E.g., requiring N frames before a hardware codec kicks in and/or limiting hardware codec usage to high-trust modes. You could apply both such restrictions to the proposed property. E.g., always return false unless trust requirements are satisfied.

Keep in mind that today all browsers expose the hardware decoding value through MediaCapabilities' powerEfficient value. Here's Safari's for VP9: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/graphics/cocoa/VP9UtilitiesCocoa.mm#L256

You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

How do you force a video element to use either the SW or the HW decoder at a given fixed resolution?

AFAIK most browsers use a simple resolution filter (see above), so it's a matter of finding the cut-offs used by each browser.

What about encoders?

MediaRecorder's total encode time will expose a hardware encoder versus a software encoder entirely on the client. A more sophisticated client can use a WebRTC loopback or server setup to figure this out similarly.

youennf commented 3 years ago

Keep in mind that today all browsers expose the hardware decoding value

Not really, Media capabilities is exposing whether it is ok for battery life to use those settings. UAs can implement heuristics in various ways. Hardware acceleration is an (important) implementation detail that is used for that 'battery-life-friendly' feature.

Exposing features is ok, exposing implementation strategies does not look appealing.

AFAIK most browsers use a simple resolution filter (see above), so it's a matter of finding the cut-offs used by each browser.

Web pages can try to detect at which resolution OSes might switch from SW to HW. Web pages cannot currently force a HW decoder at a given fixed resolution.

MediaRecorder's total encode time will expose a hardware encoder versus a software encoder entirely on the client.

It really depends whether the UA is fine exposing this information or not. MediaRecorder spec allows to delay events if needed. Ditto for WebRTC spec.

In general, the hardware acceleration field is exposing implementation strategies/details, while it is preferable to expose capabilities. As an example, a SW codec might have different efficiency whether ARM-based or x86-based but I do not think we want to expose whether the device is ARM or x86.

The hardware acceleration field is exposing new information that I do not think is available:

whether HW codec is available at a given (small) resolution (plus how they behave at these resolutions)
whether SW codec is available at a given (high) resolution (plus how they behave at these resolutions)
how many HW codec slots might be available. Plus the possibility for pages running on the same device to try locking HW slots as a side channel information.

chcunningham commented 3 years ago

Do you know what was PING WG assessment of this particular field?

I will defer to @chcunningham for this question.

This field was included in the spec during PINGs review. To my memory, no particular concerns were raised.

dalecurtis commented 3 years ago

Not really, Media capabilities is exposing whether it is ok for battery life to use those settings. UAs can implement heuristics in various ways. Hardware acceleration is an (important) implementation detail that is used for that 'battery-life-friendly' feature.

I agree semantically, but to be clear no UA implemented the heuristic in a way that avoids fingerprinting. I want to highlight that here since despite all UAs caring about fingerprinting, a better solution was not found -- which suggests that we're all following the least-power principal as best we can.

Exposing features is ok, exposing implementation strategies does not look appealing.

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec. Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

Web pages can try to detect at which resolution OSes might switch from SW to HW. Web pages cannot currently force a HW decoder at a given fixed resolution.

I feel this is another semantic argument that isn't practical. Sure a page can't force a UA to use hardware decoding for an 8K video, but the consequences of a UA not doing so disadvantage the user to the point that no UA is going to do that.

It really depends whether the UA is fine exposing this information or not. MediaRecorder spec allows to delay events if needed. Ditto for WebRTC spec.

Encode time is only one avenue, the encoded pixels will also vary with implementation details. In addition to varying delay, the UA would also have to sprinkle noise into the source before encoding, which will hurt encoding performance and quality.

In general, the hardware acceleration field is exposing implementation strategies/details, while it is preferable to expose capabilities.

Whether something is an implementation strategy or capability is context dependent. At the level of a codecs API, there's precedent in nearly every API for exposing hardware acceleration as a capability:

Do you have any alternative suggestions on how we can solve the use cases @sandersdan mentions? We're definitely open to alternative mechanisms for solving the problems of 'preferring efficiency' and 'avoid broken/slow hardware/platform codecs'.

The hardware acceleration field is exposing new information that I do not think is available:

whether HW codec is available at a given (small) resolution (plus how they behave at these resolutions)

whether SW codec is available at a given (high) resolution (plus how they behave at these resolutions)

how many HW codec slots might be available. Plus the possibility for pages running on the same device to try locking HW slots as a side channel information.

I don't agree this isn't available, I do agree it would be easier to pin this information down with our proposed API.

jernoble commented 3 years ago

@dalecurtis said:

I'll leave the rest of the argument to Dan, but I doubt this is a new fingerprinting vector. You can already force hardware detection and variant analysis in all browsers through a canvas.drawImage(video) using a very high resolution video.

Per PING (IIRC, via @hober), that fingerprinting can occur in a similar way through another API is not itself justification for ignoring the fingerprinting concerns of new API, as it just adds to fingerprinting technical debt.

jernoble commented 3 years ago

@dalecurtis said:

Keep in mind that today all browsers expose the hardware decoding value through MediaCapabilities' powerEfficient value. Here's Safari's for VP9: https://trac.webkit.org/browser/webkit/trunk/Source/WebCore/platform/graphics/cocoa/VP9UtilitiesCocoa.mm#L256

Chair hat off; implementer hat on

Note that this merely reveals whether the system has a hardware decoder. It can't be used as a side channel to detect, for example, that another tab is already using one of the limited set of hardware decoder slots, nor can it be used to determine how many slots the current system has.

jernoble commented 3 years ago

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec.

Forgive my ignorance here, but are UAs free to reject hardwareAcceleration = require in all cases where MediaCapabilities would say powerEfficient: false, and reject hardwareAcceleration = deny in all cases where MediaCapabilities would say powerEfficient: true [edit: and resolve otherwise]? In other words, is there an available fingerprinting mitigation strategy available where UAs would just not expose per-decode information, and instead use coarse-grained system-level capabilities?

youennf commented 3 years ago

I don't agree this isn't available, I do agree it would be easier to pin this information down with our proposed API.

Can you describe how this is available? If you look at powerEfficient in Safari, it is per codec type, not based on resolution for instance. I do not know how a web app could force Safari to try H264 HW decoding at a low resolution/SW decoding at high resolution, and see whether that fails/how it performs.

Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

That is contradicting a previous statement in this thread:

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't follow, without the field there isn't a way to forcefully fall back. Most applications won't have a WASM fallback.

If pages will have backup codec, a reasonable approach for a web app is to:

Provide as much information as possible to the UA for the UA to pick a codec and configure it as well as possible for the app usecase
Monitor the codec result
If not good enough, fallback to the backup codec

As part of step 1, WebCodec API could provide more knobs/hints to better setup codec: prefer low-latency, prefer battery efficiency, prefer throughput... I think the low-latency knob for instance is something that might get consensus (see https://github.com/w3c/webcodecs/issues/241#issuecomment-852737807)

dalecurtis commented 3 years ago

There's nothing preventing a UA from rejecting whatever configurations it wants via the WebCodecs interfaces. If Safari or another UA chooses to reject all hardwareAcceleration = require or hardwareAcceleration = deny requests, that's absolutely allowed by the spec.

Forgive my ignorance here, but are UAs free to reject hardwareAcceleration = require in all cases where MediaCapabilities would say powerEfficient: false, and reject hardwareAcceleration = deny in all cases where MediaCapabilities would say powerEfficient: true [edit: and resolve otherwise]? In other words, is there an available fingerprinting mitigation strategy available where UAs would just not expose per-decode information, and instead use coarse-grained system-level capabilities?

Yes. The UA has a lot of agency in how it replies here. The best way to think about isConfigSupported() is that it's a strong hint. E.g., practically speaking, isConfigSupported('hw=require') may not be satisfiable by the time configure() is called. As such any mitigations UAs apply to MediaCapabilities are available here as well.

dalecurtis commented 3 years ago

Can you describe how this is available? If you look at powerEfficient in Safari, it is per codec type, not based on resolution for instance. I do not know how a web app could force Safari to try H264 HW decoding at a low resolution/SW decoding at high resolution, and see whether that fails/how it performs.

Safari is likely the hardest to force to reveal useful fingerprinting bits here since macOS/iOS are more homogenous platforms than other UAs typically run on. HW decoding at a low resolution may be achievable through a set of crafted container and header lies - possibly not even lies depending on the codec feature set. SW encoding/decoding at a high resolution could be achieved by exhausting the kernel slots for hardware codecs.

Pages will already have to have a backup codec mechanism (likely WASM) to set such preferences.

That is contradicting a previous statement in this thread:

This strategy can be done without using the hardware acceleration field, just try what the OS provides.

I don't think these are in contradiction, but sorry it's unclear. My statement was specifically about clients which set 'require'. Pages that use 'deny' or 'allow' are unlikely to have a WASM fallback for non-technical reasons.

As part of step 1, WebCodec API could provide more knobs/hints to better setup codec: prefer low-latency, prefer battery efficiency, prefer throughput... I think the low-latency knob for instance is something that might get consensus (see #241 (comment))

We're all for more knobs, please keep the suggestions coming! Something like requirePowerEfficiency would indeed mostly solve the 'hardwareAcceleration=require` case, but we haven't found a good knob to indicate 'avoid broken/slow hardware/platform codecs' to solve the 'hardwareAcceleration=deny' case. Can you think of one?

youennf commented 3 years ago

I would go with a hint like a codecSelectionPreference enum with 'powerEfficiency' and 'maxCompatibility' as possible values.

Implementations would select either the OS codec or their own copy of a SW codec if they have one based on that field. For mobile UAs, any HW codec used by WebRTC is probably good enough to qualify for maxCompatibility, but it would really be up to UAs.

Pages that use 'deny' or 'allow' are unlikely to have a WASM fallback for non-technical reasons.

I can understand for 'allow'. For 'deny', some OSes might not allow to use a SW H264 encoder at some resolutions (or even provide a SW H264 encoder at any given resolution). It seems applications would need a fallback in that case.

jernoble commented 3 years ago

@dalecurtis said:

Yes. The UA has a lot of agency in how it replies here. The best way to think about isConfigSupported() is that it's a strong hint. E.g., practically speaking, isConfigSupported('hw=require') may not be satisfiable by the time configure() is called. As such any mitigations UAs apply to MediaCapabilities are available here as well.

Ok, then at a minimum it would be useful to point out that mitigation in the privacy considerations section of the spec.

Best possible practice would be to normatively declare that hardwareAcceleration=require/requirePowerEfficiency SHOULD reject where MediaCapabilities would return powerEfficient:false, and perhaps hardwareAcceleration=deny/requirePowerInefficiency SHOULD reject where where MediaCapabilities would return powerEfficient:false, as that limits the amount of information exposed to no more than is already available through MediaCapabilities. But both the above mitigation and rate limiting would probably meet Best Practice 7: Enable graceful degradation for privacy-conscious users or implementers.

jernoble commented 3 years ago

@chcunningham said:

This is a feature request from several sophisticated apps that bring their own encoders/decoders implemented in WASM (with customized features and tuning). Such apps are interested in WebCodecs only when we can offer hardware acceleration. If we can only do software encoding/decoding, they prefer their WASM codec

Is this requirement not satisfied by MediaCapabilities? And in parallel, what's the use case for hardwareAcceleration=deny?

chcunningham commented 3 years ago

Is this requirement not satisfied by MediaCapabilities?

No. For decoding, MediaCapabilities is specified to serve type="file" or type="media-source" use cases. Neither of these maps to WebCodecs. There is some overlap, but they are not normatively coupled. Integration with MC was considered in #25. We've since solved with the WC isConfigSupported() APIs. Having said that, as mentioned above, UAs may still choose to only suport hw-accel allow/deny in a way that matches their MC replies.

Best possible practice would be to normatively declare that hardwareAcceleration=require/requirePowerEfficiency SHOULD reject where MediaCapabilities would return powerEfficient:false,...

While this may be the right call for privacy sensitive implementers, I do not consider it a best practice generally. Chrome's MC codec/resolution cutoffs were made with the file/meida-source use cases in mind. With WebCodecs, it is intentionally a goal that users be able to make different choices from <video> to suit their use cases.

Ok, then at a minimum it would be useful to point out that mitigation in the privacy considerations section of the spec.

That's fine with me.

And in parallel, what's the use case for hardwareAcceleration=deny?

See spec note: https://w3c.github.io/webcodecs/#hardware-acceleration (typo, note calls it "disallow", but its really "deny") See @youennf's opening comment: "Another possibility is to maximise compatibility and use 'deny'." See @dalecurtis comment above: "but we haven't found a good knob to indicate 'avoid broken/slow hardware/platform codecs' to solve the 'hardwareAcceleration=deny' case"

dalecurtis commented 3 years ago

I would go with a hint like a codecSelectionPreference enum with 'powerEfficiency' and 'maxCompatibility' as possible values.

We're open to hints like this, either as an enum or a set of booleans (optimizeForCompatibility). However, from a standards perspective, defining 'compatibility' seems difficult in ways that hardwareAcceleration is not. "Most likely to work consistently on all platforms supported by the UA" seems about right, but implies some negativity when unset or set to false. Additionally, from an authors perspective hardwareAcceleration is much more specific and easier to understand without having to delve into the details of what it means to each UA. See https://github.com/w3c/webcodecs/issues/239#issuecomment-852537500 for extensive precedent.

I can understand for 'allow'. For 'deny', some OSes might not allow to use a SW H264 encoder at some resolutions (or even provide a SW H264 encoder at any given resolution). It seems applications would need a fallback in that case.

Some may indeed choose to do so, but unlike 'require' it's not a sure thing. Historically (from the perspective of Chrome relative to the platform), 'deny' is more likely to be used by a developer as an escape hatch for users with problematic hardware. I.e., they may offer a user setting to disable hardware acceleration to workaround a given driver or platform issue. Obviously some of this falls on the platform to fix, but as we all know this can be a long tail problem with a lot of cliffs for developers to fall into.

Fallback may also mean fallback to a different codec or different profile. It's not necessarily a binary choice.

[Removed section that @chcunningham already addresses above]

jernoble commented 3 years ago

@chcunningham said:

See spec note: https://w3c.github.io/webcodecs/#hardware-acceleration (typo, note calls it "disallow", but its really "deny") See @youennf's opening comment: "Another possibility is to maximise compatibility and use 'deny'." See @dalecurtis comment above: "but we haven't found a good knob to indicate 'avoid broken/slow hardware/platform codecs' to solve the 'hardwareAcceleration=deny' case"

These use cases beg the question: that hardware codecs will have high startup latency or broken behavior. This may be the case for specific hardware/software/UA combinations, but I'm concerned that a given site will disable hardware acceleration everywhere because they see issues only on certain configurations. So to be truly effective, sites will have to do UA sniffing or some amount of fingerprinting shenanigans to figure out they're running on problematic hardware. And if they're doing that, they already know the answer without the hardwareAcceleration=deny requirement.

dalecurtis commented 3 years ago

if they're doing that, they already know the answer without the hardwareAcceleration=deny requirement.

Without hardwareAcceleration=deny they can't do anything with that information but stop using WebCodecs though. Assuming they detect things don't work, they would then be have to ship a WASM solution (worse, non-technical issues), switch to a non-codec based experience, or point folks at a native application.

youennf commented 3 years ago

If we summarise things, there seems to be 3 different points:

UAs may have several codec implementations at hand. Getting input from the web app to drive UA selection might prove useful. A hint API is a natural fit here.
Web applications may want to use WebCodecs for power efficiency reasons. An approach is to use WebCodec based on MediaCapabilities powerEfficient (for webrtc or media-source/recording types).
Web applications may want to avoid specific codec behaviours that may more often happen on some hardware codecs. One approach would be something like: use powerEfficient hint and check whether this is good or not. If not, use compatibility hint. If still not good enough, switch to another codec.

However, from a standards perspective, defining 'compatibility' seems difficult in ways that hardwareAcceleration is not.

I agree a very precise definition is not easy. I think there is existing text in the spec that could be used as a basis. Hardware acceleration decoders may be less robust than software-based implementations. Hardware acceleration encoders may be less flexible in how they can be set up.

youennf commented 3 years ago

There is also the 'high startup cost' of HW codecs. This seems orthogonal to compatibility constraints, another hint value could be used for that. This begs the question of how much startup cost is an issue in practice compared to say camera setup time, network RTTs, VC call setup time...

AFAIK, this is not an issue that was brought up in WebRTC context. Do you know where it would potentially be an issue?

sandersdan commented 3 years ago

Do you know where it would potentially be an issue?

My understanding is high startup cost is typically only an issue for streams that are also low-resolution (eg. decoding seek thumbnails), so in most cases UAs wouldn't pick a hardware decoder anyway.

That said, any one- or few-shot decoding cases may prefer to trade startup latency for CPU usage. That would include single-frame video streams being used in image-like situations (VP9 being used to store HDR images comes to mind).

There may also be uses where startup latency is important even for a longer stream. I'm not currently aware of any that wouldn't also work around the issue by using lower-resolution initial data or a still placehoder (eg. poster image).

chcunningham commented 3 years ago

If we summarise things, there seems to be 3 different points:

@youennf many of these points hinge on using MediaCapabilities. I don't think that's a good plan. See my earlier comment.

chcunningham commented 3 years ago

I'm also concerned that moving this from strict "require' to a more relaxed "hint" breaks the ability for users to fallback to their own WASM software codecs when they really only wanted to use WC for hardware acceleration. I believe this was important to both Zoom @fideltian and VLC @jbkempf - would y'all mind chiming in?

zhlwang commented 3 years ago

We still need to use WebCodecs for hardware acceleration as Zoom web client, because now the software decoder of WC still can not output a frame based on a single packet. Our app needs to output a frame based on a single packet, and the hardware decoder of WC can do this.

jan-ivar commented 3 years ago

I think it would be helpful to ask the PING for clarification here about the hardware fingerprinting risks, since they've shot down other APIs in the past over those concerns https://github.com/w3c/webrtc-stats/issues/550. cc @pes10k

chcunningham commented 3 years ago

Our privacy review was conducted by @jonathanKingston, discussed with PING here https://www.w3.org/Privacy/IG/summaries/PING-minutes-20210506

The topic of hardware acceleration was discussed extensively, but focused on the larger concern that different encoder hardware will produce different encoded bytes for the same input. This is a much more interesting fingerprint than the hardwareAcceleration flag we're discussing here. Two machines that appear identical from POV of the hardwareAcceleration flag (e.g. both supporting accelerated h264) may produce different encoded outputs due to differing implementation/manufacturing choices. We discussed how this is not a new problem: already true for MediaRecorder. We discussed mitigations, including disabling accelerated encoding.

In my view, the hardwareAcceleration flag we're proposing adheres to PING guidance

Best Practice 1: Avoid unnecessary or severe increases to fingerprinting surface, especially for passive fingerprinting.

The flag is necessary to satisfy the use cases outlined above. The flag does not represent a severe increase to fingerprinting: the flag alone is not uniquely identifying and there is large overlap (but not perfect equivalence) with existing APIs.

Best Practice 5: Design APIs to access only the entropy necessary.

The flag exposes only what is necessary to enable the use case. It is a simple boolean that considers all hardware acceleration. It does not expose specific details about the hardware vendor, make, model, etc.

Best Practice 7: Enable graceful degradation for privacy-conscious users or implementers.

As discussed on the call, implementers may degrade gracefully to reporting support for only a common baseline of capabilities.

youennf commented 3 years ago

Best Practice 1: Avoid unnecessary or severe increases to fingerprinting surface, especially for passive fingerprinting.

The flag is necessary to satisfy the use cases outlined above.

AIUI (but see my last comment), the main use-case is about using realtime power efficient encoders/decoders. A UA is free to give those details using MediaCapabilities powerEfficient, taking into account resolution if they desire so. Using webrtc MediaDecodingType/MediaEncodingType is a good-enough approximation for realtime codecs. If this is not good enough, we could think of adding a new "realtime" MediaEncodingType to Media Capabilities.

What this does not cover is the case where HW encoder slots are already booked and UA might automatically degrade to SW encoder. I think it is nice to not cover this case as this is a potential cross-site information channel, hence bad for privacy.

My tentative conclusion is the flag is not necessary for the use case.

the software decoder of WC still can not output a frame based on a single packet.

@zhlwang, can you clarify whether this behavior is by design, a bug of a particular SW decoder implementation or a behavior that is sometimes useful and sometimes not useful? We are thinking of adding a 'realtime' mode to WC. Would it make sense to ensure this behavior in case 'realtime' mode is on?

dalecurtis commented 3 years ago

@youennf @zhlwang I think there's some confusion here. https://github.com/w3c/webcodecs/issues/206 resolves the decoder latency issues with software decoding. The reasons for hardwareAcceleration are orthogonal to that issue, so lets move any discussion on the issue of decoder latency to the linked issue.

What's relevant to this issue though: @zhlwang can you clarify your needs for hardwareAcceleration beyond the linked issue? In our prior discussions you indicated that you wanted to prefer your internal software decoders and encoders instead of those provided by WebCodecs if hardware codecs are unavailable. Can you confirm this point?

aboba commented 3 years ago

@youennf said:

What this does not cover is the case where HW encoder slots are already booked and UA might automatically degrade to SW >encoder.

[BA] That's the allow case (the default). I'm more concerned about whether we fully understand the implications of required. For example, what happens if:

HW encoder differs significantly in capabilities from the SW implementation (e.g. no support for scalable video coding)?
HW encoder slots are initially available, but are subsequently consumed?

Problem #1 can be handled by handled by providing distinct capabilities (and configuration) for hw profiles.

Problem #2 is more complicated. If hw is required, then it is more likely for errors to occur after successful configuration. For example, in WebRTC, a HW-only profile can be advertised in getCapabilities() and then successfully negotiated. However, this doesn't prevent a subsequent setParameters() call from failing with an RTCError.errorDetail value of hardware-encoder-not-available.

Have we worked through the error model implied by required?

dalecurtis commented 3 years ago

I don't think the second problem is unique to WebCodecs, that's true of even native implementations. The nature of hardware codecs is always that something else on the system may consume it before use.

chcunningham commented 3 years ago

A UA is free to give those details using MediaCapabilities powerEfficient, taking into account resolution if they desire so.

The UA is not free to do this. The APIs overlap but are not equivalent. Resolution is one of many factors in the WC config that may make or break compatibility with hardware acceleration.

Using webrtc MediaDecodingType/MediaEncodingType is a good-enough approximation for realtime codecs.

I disagree, given my point above. Moreover, this overindexes on "realtime". WebCodecs is not an just concerned with RTC.

If this is not good enough, we could think of adding a new "realtime" MediaEncodingType to Media Capabilities.

IIUC your idea is for MC to say when powerEfficient = true, but leave the blurred line around when that actually means hardware acceleration vs just power efficient software (under some resolution threshold). This does not preserve the user's ability to fallback to their own WASM codec in cases where the UA is otherwise going to use software.

My tentative conclusion is the flag is not necessary for the use case.

It follows from my points above that this flag is necessary to address the use case.

What this does not cover is the case where HW encoder slots are already booked and UA might automatically degrade to SW encoder. I think it is nice to not cover this case as this is a potential cross-site information channel, hence bad for privacy.

If the UA's hardware acceleration slots are booked, it should indicate non-support when hardwareAcceleration=required. This is not a useful cross-site information channel. It may be that you are 10 sites which have booked your slots, or 1 site with 10 players, or zero sites but with 10 native apps, etc. It is not possible for a bad actor to infer which of these scenarios you are in, much less anything to identify the specific site. Additionally, the booked slots are extremely volatile over short windows of time (closed tabs, closed apps, navigations, etc...) such that this does not provide anything stable enough to be used for fingerprinting.

I don't think the second problem is unique to WebCodecs, that's true of even native implementations. The nature of hardware codecs is always that something else on the system may consume it before use.

I agree.

Have we worked through the error model implied by required?

In the event of a race like Dale outlined, the configure(hardwareAccel = required) would fail, producing a NotSupportedError. I'd be happy to clarify / highlight this possibility in the spec.

zhlwang commented 3 years ago

@dalecurtis Yes, in some cases we prefer to use our internal software decoders and encoders instead of those provided by WebCodecs if hardware codecs are unavailable.

youennf commented 3 years ago

@zhlwang, can you detail in which cases you prefer using your internal SW codec but use HW if available? Can you describe the expected benefits if the information is available/downsides if it is not?

It would also be interesting to get the perspective of web developers asking for 'deny'.

jbkempf commented 3 years ago

I'm in the same position as @zhlwang. Software decoders from the OS/browsers are very often buggy, and less tested than the hw ones.

I would prefer a clear deny, and use my codecs over the sw decoders from the OS/browsers. Else, I will use the hint.

However, I don't really understand the debate here since it will be soooo much easier to fingerprint by sending data and checking the output (or the crashes)

youennf commented 3 years ago

Software decoders from the OS/browsers are very often buggy, and less tested than the hw ones.

This somehow contradicts past discussions that led to the idea of using a powerEfficient (hence HW)/compatibliity (hence SW) hint. Also, it seems to me like you would need more than a single HW/SW flag to make that kind of "this decoder is broken" decision. Or you might need to evaluate the decoder output to make sure it is good.

I would prefer a clear deny, and use my codecs over the sw decoders from the OS/browsers. Else, I will use the hint.

My understanding is that MediaCapabilities powerEfficient + the hint would cover most of your concerns. Is that true? The more you can be specific about pros/cons, the more it will help driving WG decision process.

I don't really understand the debate here since it will be soooo much easier to fingerprint by sending data

Exposing decoded data to a web page is necessary to the end user benefit. Exposing implementation details of how decoding is done is not necessary to the end user benefit. Also, we do not want to expose new fingerprinting surfaces at zero fingerprinter cost. https://github.com/w3c/webrtc-stats/issues/550 has some additional information.

marcello3d commented 3 years ago

I’ve been following this project/thread from the sidelines (I built the various video decoding/encoding bits of Descript in Electron using native ffmpeg).

Most of this discussion is on decoding, but I haven’t seen this point raised: I’ve found that ffmpeg’s software encoder (x264) gives much higher quality than hardware encoder for non-realtime flows (at least on macOS, using ffmpeg’s h264_videotoolbox). I’m not sure if this is due to VBR vs CBR, but I see the trade off as:

software: high quality per bitrate, high cpu, low gpu, faster with more cores
hardware: low quality per bitrate, low cpu, med gpu

w3c / webcodecs

Is exposing https://w3c.github.io/webcodecs/#enumdef-hardwareacceleration a good idea #239