w3c / compute-pressure

A web API proposal that provides information about available compute capacity
https://www.w3.org/TR/compute-pressure/
Other
69 stars 10 forks source link

Add an implementation-defined change threshold exceeded check #176

Closed anssiko closed 1 year ago

anssiko commented 1 year ago

We've received Dev Trial feedback from a major ISV that the state changes between "serious" and "fair" pressure states are too rapid as illustrated below:

cp-demo

This suggests the spec should add an informative note that recommends implementers to add hysteresis to the notify pressure observers algorithm and amend the algorithm steps accordingly to allow an implementation to bail out without notifying if the threshold has not been reached. This seems particularly important at the top-end of the range based on the feedback received.

Adding hysteresis in the implementation as opposed to in the JS logic would also further improve privacy by lowering the entropy of the overall system. This improvement could be noted in the security and privacy considerations section.

(The mathematical model is formalized as Preisach model of hysteresis and is being used in various fields of science and engineering.)

kenchris commented 1 year ago

There was already the plan to add an average value for the last 10 seconds, 30 seconds and something like that, like the pressure stall information on linux has.

anssiko commented 1 year ago

I think one way to engineer a solution to this could be to model this as a rate-independent hysteresis. I.e. we only care about the (past) state and don't care about the time.

Maybe this is easier to explain in pseudocode:

When we're "serious", transition to "fair" when cpu is less than 85:

if ((state == "serious") && (cpu < 85))
  state = "fair";

When we're "fair", transition to "serious" when cpu is more than 90:

if ((state == "fair") && (cpu > 90))
  state = "serious";

Please note cpu is a placeholder for some input received from the platform collector (likely not CPU utilization, because it has known issues). Also the values 85 and 90 are just placeholders for illustration purposes.

This idea was inspired by a nonideal relay:

In this visual, cpu is the x axis and state is the y axis. cpu thresholds α and β map to 85 and 90 respectively in the pseudocode example. The state "serious" maps to y = 1 and "fair" to y = 0.

With carefully chosen cpu metric and thresholds I believe this would avoid flip-flopping between pressure states. We could include some well-known algorithm like this in the spec as an informative note. Implementers are expected to tune their algorithms for various hardware, so keeping this algorithm informative will allow implementers to innovate and improve the quality of implementation over time with consideration for evolving workloads.

kenchris commented 1 year ago

Yes, I understand the point of this, but it speaks a bit against fine grained frequency (maybe that is fine), because spikes in utilization (boosting) is quite common. Only a high frequency allows you to catch such cases.

But I think adding the average also makes sense to know how it behaved over time, like last 10 sec, last minute and last 10 minutes. This should also be more efficient to calculate in the browser than in JS and we could make it opt in.

anssiko commented 1 year ago

As I understood it from this Dev Trial feedback, too frequent flip-flopping (e.g. once a second) between pressure states was the issue and the preference would be to have the native implementation smooth this out i.e. provide less frequent change notifications. These changes might now trigger a user-visible UI change once a second unless web developers do filtering in JS which would feel wrong and I agree would be less performant. It should be said this is an edge case where the system is pushed to its limits.

Wouldn't it be better for (most?) mainstream use cases to remove the high-frequency spikes by default (similar to impulse noise in audio signal) by applying a median filter or some such on the input before it is fed to the notify pressure observers algorithm? I mean, insert cpu = med(unfiltered_cpu) as the first thing in the pseudocode where med() is a median filter of an appropriate window size. Should work, right?

Maybe I misunderstood what you meant by fine grained frequency, but let me ask:

Are we aware of important use cases that'd ask for high frequency notification updates? Maybe use cases where e.g. a single dropped frame means a fatal failure? If there are such use cases, maybe a high-frequency mode should be a configuration option? Rate-limiting change notifications mitigation would not apply in such cases I believe, so we'd need to think about the privacy impact.

How the system behaved over say the last 10 minutes sounds like an interesting additional input to be fed to the algorithm.

anssiko commented 1 year ago

It looks like there's an even better place where to inject this "change threshold exceeded" check: augment step 3 in https://www.w3.org/TR/compute-pressure/#dfn-has-change-in-data

I will submit a PR for this proposal so we can engineer the details there.

anssiko commented 1 year ago

Here's the PR with this idea written out in spec terms: https://github.com/w3c/compute-pressure/pull/180