w3c / compute-pressure

A web API proposal that provides information about available compute capacity
https://www.w3.org/TR/compute-pressure/
Other
68 stars 10 forks source link

High-level metrics to improve web developer ergonomics #24

Closed anssiko closed 2 years ago

anssiko commented 2 years ago

I had a chat with @kenchris to propose we revisit both the existing use cases and new use cases that have recently emerged (e.g. https://github.com/WICG/compute-pressure/issues/14) to understand whether the current cpuSpeed and cpuUtilization metrics are still the best fit.

I think there's an opportunity to make the API even more ergonomic for web developers who are not experts in computing performance and tuning, and not familiar with related concepts.

I'd like us to assess whether the current use cases could be served with an API that instead of (or in addition to) the current cpuSpeed and cpuUtilization numerical pair would expose a finite set of human-readable compute pressure states that have semantics attached to them.

What I'm interested in exploring is to see if we could raise the level of abstraction (bonus: more privacy-preserving, future-proofing) and make the underlying low-level metrics implementation details. The low-level metrics are harder to explain to web developers and might evolve and in some cases become misleading. I suspect they could be more easily misinterpreted as well.

In this proposal, the low-level metrics to high-level metrics mapping would become an implementation detail, and implementations could also take into consideration other factors that may influence the compute pressure state such as device form factor, thermal budget, and so on when making the decision.

Here's a strawman proposal, plugging into the existing API for illustrative purposes:

enum ComputePressureState { "nominal", "fair", "serious", "critical" };

dictionary ComputePressureEntry {
  ComputePressureState state = "fair";
}

Thoughts?

Related, I think this blog post https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html arguing CPU utilization metric is misleading should be reviewed. It trigged quite a long discussion among developer audience (also on HN), so there's probably good nuggets of information hidden also in the comments. Leaving it here for interested folks to digest.

[Edit: The state names in the strawman proposal were tweaked a bit. The names should be considered as placeholders to illustrate the idea. These names are subject to change based on feedback received.]

kenchris commented 2 years ago

I dislike these names though :-)

Low indicates that there will be a high, so I think something like "nominal" makes more sense

Normal also doesn't really make sense, what is normal load? I think "fair" is great because it is in use but load is fair and sustainable.

On the other hand moving from fair directly to critical seems a bit abrupt, so we would need a level before that, like its not critical yet, but getting there, like "serious"

nominal, fair (sustainable?), serious (significant), critical

anssiko commented 2 years ago

Naming aside (🚲😁), I think working out this state machine and conditions when to transfer from one state to another would be a helpful exercise to figure out if this proposal has merit. That should shed light into what the semantics of each state should be, and how many states there should be.

I'll document some of my additional thoughts below for discussion.

My wish list for this proposed high-level API based on what I think web developers expect:

From privacy perspective (with a chair hat on):

For future-proofing (a browser implementer's hat on):

kenchris commented 2 years ago

In theory we should not spec what exact values map into these states as it can differ per hardware platform and even in many other cases, like a platform might become critical due to thermals but not be under heavy CPU load (looking at clock speed and utilization) and clock speed boosts can work quite differently whether connected to direct power (DC) or running off battery (AC).

I also think it would be great that silicon vendors can be innovative in this area on their platforms

anssiko commented 2 years ago

Based on my initial assessment the spec should not normatively define a mapping from high-level states to any low-level metrics (such as speed or utilization value ranges) but leave that to the implementation. Otherwise, the high-level abstraction would get anchored into low-level metrics that can be misleading.

The abstraction should be defined as such it can be layered atop existing low-level metrics such as instructions per cycle (IPC) or its multiplicative inverse, cycles per instruction (CPI). It should also be possible for an implementation to make use of methods that better consider performance bottlenecks such as top-down microarchitecture analysis.

For a concrete example, it'd be up to the implementation to interpret what IPC < 1.0 or IPC > 1.0 mean in terms of compute pressure states. The former is likely CPU memory-bound, latter CPU instruction-bound. If memory-bound, different software tuning strategies apply than in an instruction-bound scenario. This suggests the spec should perhaps have informative content for implementers around the low-level metrics and their interpretation.

To summarize my thinking:

Most importantly, I believe this layering would follow the priority of constituencies principle.

fideltian commented 2 years ago

HI Anssi & Kenneth,

 I think it makes sense if CPU manufacture could tell the applicaiton value of cpu pressure.Currently, we just did some adaptor according to the CPU usage. For example: 
 >85% : we might downgrade the resolution to assure audio.
60%-85%: we might upgrade the resolution. 

So if just consider the usage(besides others you mentioned, such as cooling, battery, memory, etc. ). We think we could give more buckets beyond 50%. such as(just a concept example) : normal: <55% fair: 55%-70% serious: 70%-85% Warning: 85%-95% critical: >95%

anssiko commented 2 years ago

@fideltian thanks for the discussion and this feedback from Zoom PWA perspective!

Hearing that this high-level metrics proposal is getting support, we're now investigating how many states would strike the right balance between the needs of web developers and privacy. I feel 4-5 would be a good starting point, but this will be clarified as the work progresses.

Some additional considerations:

We'll keep on refining this proposal and will loop you in for review. Thanks for your contributions!

anssiko commented 2 years ago

WebKit recently restored navigator.hardwareConcurrency (Bug 233381, ships in Safari TP 138) and based on comments in that WebKit bug there would be preference for a higher-level API instead (if there was one). When this proposal is more baked in, I think it’d be good to reach out to WebKit friends for review, and make a connection to that WK bug for context.

kenchris commented 2 years ago

The specification and explainer has been updated with this new approach

anssiko commented 2 years ago

@kenchris thanks for updating the spec. When I opened this issue, I honestly did not expect my proposal to be turned into spec prose this fast! But given the consensus emerged fast and the proposal resonated with folks, including implementers, moving fast was appropriate.

Before you close this issue, I suggest you spend some time to update https://wicg.github.io/compute-pressure/#security-and-privacy-considerations -- currently it contains references to the old deprecated API.

Security and privacy considerations are very important for new work that is expected to advance to standardization. This new approach brings substantial improvements in these areas in addition to developer ergonomics improvements and design that is future-proof. My recommendation is to be explicit about these improvements, because some implementers may have reviewed the old API and have formed an opinion based on the old design. Concerns raised earlier have been addressed by the new API design but that may not be obvious to people who are not following this work closely.

anssiko commented 2 years ago

@kenchris thanks for https://github.com/WICG/compute-pressure/pull/51 -- this is very helpful for reviewers.

I suggest you reference https://github.com/WICG/compute-pressure/blob/main/security-privacy-self-assessment.md from https://wicg.github.io/compute-pressure/#security-and-privacy-considerations and rewording this:

Exposing hardware related events related to low level details such as exact CPU utilization or clock speed increases the risk of harming the user's privacy.

To minimize this risk, only the absolute minimal amount of information needed to to support the use-cases is exposed.

Proposal:

To mitigate this risk, no such low level details are exposed.

kenchris commented 2 years ago

This has been done, I think we can close this now

anssiko commented 2 years ago

Thanks!