w3c / sensors

Generic Sensor API
https://www.w3.org/TR/generic-sensor/
Other
127 stars 59 forks source link

Define processing model #198

Open tobie opened 7 years ago

tobie commented 7 years ago

Some of our earliest issues involved tying sensor polling to animation frames (see #4).

Initial implementor feedback pushed in the same direction, for performance reasons, see: https://github.com/w3c/sensors/issues/152#issuecomment-267967852.

So we designed a system around an animation frame funnel, where polling frequencies > animation frame rate were possible, but new readings wouldn't be delivered until the new animation frame request. This implied creating an API to handle buffered readings (see #171).

Now the latest feedback from implementors seems to at least partially reconsider this, see the thread at https://bugs.chromium.org/p/chromium/issues/detail?id=715872.

Reducing latency, gathering data a high frequencies, etc., are important requirements for sensors. The WG needs to understand better how these implementation design choices influence these requirements, and then needs to design a processing model that is usable by developers, delivers on these requirements, permits interoperability and allows vendors to implement sensors in a performant way.

As a first step, it might be useful to resurface the use case document, improve it (a lot) and move use cases and requirements to the spec.

tobie commented 7 years ago

Note there are also some use cases and requirements in the motion sensor explainer @kenchris and @alexshalamov wrote: https://w3c.github.io/motion-sensors/#usecases-and-requirements

Would be good to get some feedback from actual industry players, there.

anssiko commented 7 years ago

Who are the actual industry players we need additional feedback from? We have received feedback from Chrome engineers in https://crbug.com/715872 and this proposed solution is applicable to any modern browser engine.

TL;DR: Binding notification to an animation frame is not recommended, since it will cause redundant redraws. In such a case, even if the page is not requesting new frame to be redrawn, we still need to schedule new animation frames in order to provide notifications from the sensor. All this is explained in the above crbug in more detail.

tobie commented 7 years ago

Well, for WebVR, for example, there pretty much will be a redraw for every frame, so I'm not sure avoiding to cause redundant redraws is something we should optimize for here. Which is not to say it might not be an issue (as mentioned in crbug) for other use cases.

You say that this proposed solution is applicable to any modern browser engine, but we haven't heard from anyone else on this topic but Chrome engineers. Furthermore, during the initial TAG review, @slightlyoff expressed serious performance concerns about the design which is now again proposed. I had imagined that the animation frame based design would help alleviate these issues. Now that we're back to the previous design, we need to take this feedback in account again.

Maybe it turns out that it's really not an issue, but we still need to do due diligence.

The crbug also seems to discuss latency issues, and I'd like to make sure we're really hitting industry requirements here, because if we don't, we'll literally make people sick.

All in all, it doesn't seem such a huge deal to ponder such a massive redesign when we've been spending the last 6 months on a solution that was supposed to fix the problems of the solution we now claim hasn't got any.

Maybe going back to the first solution is the right choice, but I don't think this is something that should be decided on a whim without seriously considering all its aspects.

anssiko commented 7 years ago

You say that this proposed solution is applicable to any modern browser engine, but we haven't heard from anyone else on this topic but Chrome engineers.

The reason why we haven't heard from others is likely that Chrome engineers are the first ones to implement the specification, and as such have invested significant amount of time into the work and reviewed the spec more closely than others. Why not make the Editor's Draft spec reflect reality aka the first implementation? The implementation is subject to change as long as the spec is considered a draft.

The spec update in #197 makes it easier for other browser vendors to scrutinise the design without peeking into the Chromium codebase and reverse-engineer its behaviour. The spec as it is currently, is not documenting any implementation, so it's not helping other browser vendors understand what we're up to, or provide constructive feedback that would help change also Chrome's implementation for better.

pozdnyakov commented 7 years ago

Please note, #197 does not add or remove anything related to tying sensor polling to animation frames -- it just fixes the existing issues in abstract operations improving the current ED.

tobie commented 7 years ago

Can we please keep the conversation about #197 in #197?

anssiko commented 7 years ago

The rationale for the proposed design to decouple the processing model from animation frames (via crbug.com/715872):

Queuing an animation task every frame is making a large amount of the system churn even if nothing in the UI is changing in response to the sensor data. We generally consider an infinite raf loop where nothing is changing in the UI a bug in apps because it uses a lot of power. Creating any continuously polling sensor is pretty much the same in the current design.

Note that the raf system is sending plenty of IPCs around the system every 60fps due to the BeginFrameSource and the way the browser/gpu process and renderer talk to each other. I'd really like to see data on the 2ms you're quoting, that's 2ms for what in which situations?

Also the system you're coupled to currently (raf) is throttled for lots of reasons, for example third party iframes, off screen iframes, it doesn't run at all inside frames that are still loading, it doesn't run in background tabs, it'll block on the GPU being busy, ... So this means that if you insert a bunch of DOM into the page that takes 100ms to raster you're losing 100ms of sensor samples even if the main thread was idle.

I think you want a separate system that schedules the tasks unless you want all those side effects for sensor events.

I'm also curious to see an example sensor data stream and how important it is to get the changes in real time at 60fps. For lots of sensors (ex. ALS) getting it periodically and in batches is often enough. Never letting the app sleep because it creates an instance of the AmbientLightSensor or AccelerometerSensor might not be the best design. It's so easy to end up in a situation where your page is using infinite power.

As noted earlier, this proposed solution is applicable to any modern browser engine with a multi-process architecture, and is not a Chrome-specific optimization.

@tobie mentioned the WebVR example:

Well, for WebVR, for example, there pretty much will be a redraw for every frame, so I'm not sure avoiding to cause redundant redraws is something we should optimize for here. Which is not to say it might not be an issue (as mentioned in crbug) for other use cases.

A WebVR implementation redraws a new frame at the native refresh rate of the VR display, and reads the latest available pose synchronously (note, there's no onvrposechanged). However, sensors have many use cases where redundant redraws are an issue, for example, any use case in which sensor data is collected and analyzed without visible changes to the UI would be inefficient to implement if coupled with animation frames.

In other words, a new sensor reading should not cause a side-effect of scheduling a new animation frame.

Reducing latency, gathering data a high frequencies, etc., are important requirements for sensors.

The proposed solution to decouple from animation frames reduces latency (because we're not synchronizing with animation frames, and can fire the event immediately) and allows polling at higher frequencies in comparison to animation frame coupling (because we're not limited by the refresh rate of the rendering pipeline). In addition, it improves power efficiency of implementations that is an important factor for battery-powered devices (because we're not forcing redundant redraws).

I consider the consensus position of the Chrome engineers and Generic Sensor API implementers in Chrome a strong case in favour of decoupling the processing model from animation frames.

tobie commented 7 years ago

any use case in which sensor data is collected and analyzed without visible changes to the UI would be inefficient to implement if coupled with animation frames.

I absolutely understand that now and I don't think anyone disagrees with this. As mentioned above I'm mostly concerned about the following things:

  1. Latency between polling the sensor and requestAnimationFrame. It's unclear to me whether this new design changes that latency or not. Your explanation above considers latency between polling the sensor and firing a "change" event being the key. It's not. The key is how that's tied to rAF. My question here is simple: does the design change here imply implementation changes that increase this latency, and if so by how much?
  2. Will firing events at polling speed hinder our ability to increase the polling speed in the near future (my understanding from reading the Oculus Rift paper on the subject is that the Rift polls its gyroscope at 1,000 Hz)?
  3. Won't these rapidly firing events interrupt rendering and cause jank as @slightlyoff suggests in ttps://github.com/w3ctag/design-reviews/issues/115#issuecomment-236365671?
  4. The previous architecture allowed us to increase polling frequency (to reduce latency) without providing more readings to the application layer. This was one of our mitigation strategies: https://w3c.github.io/sensors/#limit-number-of-delivered-readings. Do we no longer consider providing > 60 Hz frequency an issue or are we planning to handle this another way. If so, how?
  5. Given the case where a sensor is polled at max frequency to lower latency, but only its latest current reading is used during rAF, isn't all the extra copying and data transfer wasteful (i.e. also costly in terms of battery, etc.)?

and allows polling at higher frequencies in comparison to animation frame coupling (because we're not limited by the refresh rate of the rendering pipeline).

Sensor polling is actually totally decoupled from the pipeline either way, so I'm not sure I understand this. What's at stake here is how and when are the polled readings transferred to the application layer.

I consider the consensus position of the Chrome engineers and Generic Sensor API implementers in Chrome a strong case in favour of decoupling the processing model from animation frames.

I will too once I hear clear answers to the questions above. Depending on these answers, I might also suggest investigating making rAF coupling opt-in (like initially suggested in #4) or alternatively prioritizing worker support (though I don't know what latency/perf issues relying on workers induce).

alexshalamov commented 7 years ago
  1. Anssi mentioned latency due to synchronization with rAF and he is correct, if UA gets new sensor reading between frames, new reading will be synced with next in the queue. For 60fps, latency due to synchronization can be up to 16ms, and if web page has complex UI and framerate drops, latency would be even worse. When we decouple 'onchange' notification from rAF, latency will be lower.

  2. We don't have to fire 'onchange' events at HW sensor sampling rates and decoupling from rAF will not hinder ability to use higher sampling rates. Sensors may sample data at high rates, e.g., 8kHz for OIS. Widely used BMI160 samples gyro at 3600Hz and accel at 1600Hz. As an application developer, you can choose what would be the ODR (output data rate), also, different solutions can be used, polling or 'waiting for interrupt'.

  3. Do you know any other APIs that are not directly involved in rendering / layouting, and are rAF synchronized?

  4. If by polling frequency you mean HW sampling frequency, nothing changes. We can control both, sampling on platform side as well as output data rate due to provided frequency hint.

  5. With or without rAF synchronization, we still have to cache latest senor reading, we don't see where extra copying might come from. Due to multi-process architecture of modern browsers, unfortunately there is a drawback, privileged process needs to share sensor data with the process that renders the page, thus, faster IPC would be beneficial to reduce IPC load => power consumption. Anyhow, same architecture applicable regardless of synchronization with rAF.

Synching sensor reading changes with rAF is kind of a chicken egg problem. If web developer subscribes to 'onchange' event and engine is not scheduling new frames, 'onchange' will never be fired even data has been changed, and 'onchange' should not trigger new frames to be rendered.

tobie commented 7 years ago
  1. Anssi mentioned latency due to synchronization with rAF and he is correct, if UA gets new sensor reading between frames, new reading will be synced with next in the queue. For 60fps, latency due to synchronization can be up to 16ms, and if web page has complex UI and framerate drops, latency would be even worse. When we decouple 'onchange' notification from rAF, latency will be lower.

I think you're not taking into consideration the fact that in such scenarios, developers won't be relying on onchange events, but will instead simply get the value off of the corresponding attribute with each rAF. So the the speed at which an event is delivered in the meantime is irrelevant.

My question wasn't about this, however, but about whether this change in architecture implied a slower IPC solution to transfer the readings to the application layer which would increase latency.

  1. We don't have to fire 'onchange' events at HW sensor sampling rates and decoupling from rAF will not hinder ability to use higher sampling rates. Sensors may sample data at high rates, e.g., 8kHz for OIS. Widely used BMI160 samples gyro at 3600Hz and accel at 1600Hz. As an application developer, you can choose what would be the ODR (output data rate), also, different solutions can be used, polling or 'waiting for interrupt'.

Agreed. So we need an API to do that. Start differentiating polling frequency from reporting frequency, and find a developer-friendly way to express that and group readings together.

  1. Do you know any other APIs that are not directly involved in rendering / layouting, and are rAF synchronized?

I think you're missing the point I'm trying to make here. I assumed (as that was the understanding shared by everyone here until a week ago) that rAF sync would mitigate jank issues by staying out of the rendering path. Now that I'm told rAF won't (or doesn't work for other reasons). The jank concern stays a concern until it is provably not so. Or are you suggesting @slightlyoff's concerns are just completely out of touch?

  1. If by polling frequency you mean HW sampling frequency, nothing changes. We can control both, sampling on platform side as well as output data rate due to provided frequency hint.

Sure, but again, we need an API that allows distinguishing this. I guess the solution you're suggesting is we need to specify a polling frequency and an update frequency. This isn't very developer friendly to say the least.

  1. With or without rAF synchronization, we still have to cache latest sensor reading, we don't see where extra copying might come from.

Well, if you're polling at 1000 Hz and the web developer only uses 60 samples per seconds, then you're doing 940 extra copies per second. The rAF sync design avoided this issue altogether. Now we have to find a different way to handle this. Maybe offering rAF sync as an option (e.g. new Sensor({ freq: 1000, rAFsync: true })), maybe something completely different. This will need looking into.

Synching sensor reading changes with rAF is kind of a chicken egg problem. If web developer subscribes to 'onchange' event and engine is not scheduling new frames, 'onchange' will never be fired even data has been changed, and 'onchange' should not trigger new frames to be rendered.

I wasn't aware of these issues. The HTML spec leaves a lot unspecified about this. To be honest, I'm sort of surprised either no one knew this before or this architecture was chosen given this, but that's sort of irrelevant at this point.

As I said above, I'm not saying this change is not the right decision from an engineering perspective. I'm just pointing out that it opens up a whole new set of questions we had answers up until a week ago and that we now have to figure out from scratch.

alexshalamov commented 7 years ago

I think you're not taking into consideration the fact that in such scenarios, developers won't be relying on onchange events, but will instead simply get the value off of the corresponding attribute with each rAF. So the the speed at which an event is delivered in the meantime is irrelevant.

We take into consideration both facts, when data is read due to 'onchange' event or directly from rAF event handler. Both code paths should introduce as less latency as possible, therefore, it would be good to decouple 'onchange' from rAF.

My question wasn't about this, however, but about whether this change in architecture implied a slower IPC solution to transfer the readings to the application layer which would increase latency.

In Chrome implementation, we use same shared memory IPC, no change to architecture at all.

I think you're missing the point I'm trying to make here. I assumed (as that was the understanding shared by everyone here until a week ago) that rAF sync would mitigate jank issues by staying out of the rendering path. Now that I'm told rAF won't (or doesn't work for other reasons). The jank concern stays a concern until it is provably not so. OR are you suggesting @slightlyoff's concerns are just completely out of touch?

I just asked whether you know about other non-UI bound APIs that do rAF sync, maybe we can learn from them.

Well, if you're polling at 1000 Hz and the web developer only uses 60 samples per seconds.

I think there was consensus that we are not delivering data faster than was requested by sensor instance. High sampling rates not necessarily equal to same data rates, as I mentioned before, sensor can be running at 8kHz, while data is delivered at 60Hz, thus, we will only make 60 memcpy copies per second. If developer needs higher output data rate, higher 'frequency hint' can be provided (frequency hint is more or less ODR, in HW sensor terms).

tobie commented 7 years ago

In Chrome implementation, we use same shared memory IPC, no change to architecture at all.

Cool. That was my question (sorry it took me quite some time to formulate it in a why that was understandable). And this answer settles it.

I just asked whether you know about other non-UI bound APIs that do rAF sync, maybe we can learn from them.

Oh, sorry! It sounded like a rhetorical question. :)

Thing is, it's incorrect to consider that sensors usage isn't rAF bound. It is in some cases and isn't in others. It also happens that the cases where it is have high frequency and low latency requirements. Hence my suggestion to look into making rAF syncing opt-in.

I think there was consensus that we are not delivering data faster than was requested by sensor instance.

Yes. That's not what I'm talking about here at all.

High sampling rates not necessarily equal to same data rates, as I mentioned before, sensor can be running at 8kHz, while data is delivered at 60Hz,

Yes, that's precisely my point. The rAF sync architecture allowed to request only one frequency setting from the developer which was used both to set the frequency at which the sensor was polled AND the frequency at which the data was delivered to the developers, capped at rAF rate. The limit of memcpy was thus opaquely enforced by the API.

thus, we will only make 60 memcpy copies per second.

Well, no. That worked in the previous solution because the 60 copies were rAF sync'ed. If they're not, developers will require 1000 memcpy per second to obtain the same latency.

If developer needs higher output data rate, higher 'frequency hint' can be provided (frequency hint is more or less ODR, in HW sensor terms).

Well, that's not what's at stake in this scenario. Here, developers are essentially interested in getting a single low latency reading at every rAF. This isn't a far fetched scenario; it's how the original Oculus paper describes the solution implemented in the Oculus Rift. To obtain that you need to high polling rate on the sensor and the ability to grab the latest reading during rAF. With rAF sync, you only need to copy one sample per rAF. Without, you need all 1000 to be copied to the application layer. I can't imagine this has no perf consequences.

Again, I'm not suggesting changing course is not necessary. I'm just pointing out issues that need to be resolved, perhaps by new APIs, perhaps by extra implementation.

alexshalamov commented 7 years ago

Without, you need all 1000 to be copied to the application layer. I can't imagine this has no perf consequences.

You need sensor running at high sampling rate and grab data from sensor's registers when needed, if output data rate is non-deterministic (rAF), access on every frame can be used. If cannot be done synchronously, like in browser case, privileged process can do many different tricks, memory mapped I/O, or simply implement polling at faster rate than refresh rate, then copy latest reading to shared memory. Currently, sensor reading size is 48 bytes, memory I/O bandwidth is gigabytes per second. Updating shared buffer, even at 240Hz, would have no performance impact.

anssiko commented 7 years ago

It seems we're reaching consensus. Thanks for your contributions!

Proposed resolution: decouple the processing model from animation frames, investigate rAF sync opt-in option

Please use :thumbsup: to indicate agreement with the proposal.

anssiko commented 7 years ago

@tobie, do you also agree with the first part of the proposed resolution to "decouple the processing model from animation frames" in addition the latter part "investigate rAF sync opt-in option" you +1'd in https://github.com/w3c/sensors/pull/197#issuecomment-304232492

I'd be happy to get closure on this issue :-)

tobie commented 7 years ago

I'd like clarity on (3) first: won't these rapidly firing events interrupt rendering and cause jank as @slightlyoff suggests in ttps://github.com/w3ctag/design-reviews/issues/115#issuecomment-236365671?

alexshalamov commented 7 years ago

@tobie 'onchange' events do not trigger re-rendering, not sure how it can 'jank' rendering thread. What I can say, is that when 'onchange' forces rAF, we could have plenty of 'jank' :)

tobie commented 7 years ago

not sure how it can 'jank' rendering thread.

By occupying resources during rendering. Especially if complex filtering is done on them.

pozdnyakov commented 7 years ago

By occupying resources during rendering. Especially if complex filtering is done on them.

That's quite a generic problem if you do too much stuff in a regularly invoked JS callback you can 'jank' renderer using any API :)

If developer wants to sync sensor with rAF s/he can always use requestAnimationFrame directly:

const sensor = new Sensor({ frequency: 60 });

function draw(timestamp) {
  window.requestAnimationFrame(draw);
  let x = sensor.x;
  // Drawing ...
}

sensor.onactivate => { window.requestAnimationFrame(draw); }
sensor.start();
tobie commented 7 years ago

That's absolutely not what we talk about when we talk of syncing with animation frames. Actually it's kind of frightening that you're even suggesting that this would be an option.

alexshalamov commented 7 years ago

By occupying resources during rendering.

Resources? As I mentioned before, 1 sensor reading in chromium is 48 bytes, would not be bigger in other browsers. Event handler invocation is required regardless of design. Do you have something else in mind that may require heavy resource allocation?

Especially if complex filtering is done on them.

We don't have filtering on renderer side, if we would have any, it would be outside of the process that renders page.

pozdnyakov commented 7 years ago

That's absolutely not what we talk about when we talk of syncing with animation frames. Actually it's kind of frightening that you're even suggesting that this would be an option.

This is a workaround that can actually work atm (even before we investigated rAF sync opt-in), not sure why you are frightened

tobie commented 7 years ago

This is a workaround that can actually work atm (even before we investigated rAF sync opt-in),

Well, a workaround that disregards all the critical low-latency requirements is not a workaround. You know as well as I do that sampling at 60 Hz gives absolutely not guarantee to be able to sync with animation frames. So you're going to get latency up to PIPELINE_LATENCY + ~17ms and a latency delta of ±17ms. So just quite enough to make people in VR throw up.

not sure why you are frightened

Well, it sort of frightening that we don't seem to have aligned goals to meet requirements.

tobie commented 7 years ago

@alexshalamov:

Event handler invocation is required regardless of design.

Well, it's really not necessary for rAF sync'ed stuff, which instead is just going to get the values from the sensor attributes on each new frame.

One rational proposal could have been to only allow rAF-synced sensors in the main window (maybe even without events?), and forced all other non-rAF, event-based sensor uses into workers.

Especially if complex filtering is done on them.

We don't have filtering on renderer side, if we would have any, it would be outside of the process that renders page.

I was referring to filtering happening in JS.

alexshalamov commented 7 years ago

Well, it's really not necessary for rAF sync'ed stuff, which instead is just going to get the values from the sensor attributes on each new frame.

Sure, to clarify, I was talking about decoupling 'onchange' from rAF, time spent for 'onchange' event handlers invocation would take same time.

One rational proposal could have been to only allow rAF-synced sensors in the main window (maybe even without events?), and forced all other non-rAF, event-based sensor uses into workers.

Sure, or we can research opt-in rAF sync, sort of exposing latest reading for rAF event handler (Anssi's proposal)

tobie commented 7 years ago

Sure, or we can research opt-in rAF sync, sort of exposing latest reading for rAF event handler (Anssi's proposal)

Or bundling all the intermediary readings, depending on the use case.

anssiko commented 7 years ago

Updated proposed resolution:

Silence is considered consent to the proposal.

tobie commented 7 years ago

We'll, we still don't really have clarity on (3): won't these rapidly firing events interrupt rendering and cause jank as @slightlyoff suggests in https://github.com/w3ctag/design-reviews/issues/115#issuecomment-236365671?

I'd find it worthwhile to have a conversation about the benefit of having only a rAF-sync version in the main thread (maybe with no events at all) and evented stuff only in workers. Or maybe some different frequency caps depending on the context.

That said, the processing model is not currently linked to animation frames in the spec, and I don't intend to change that until we have consensus on how to move ahead with this.

I don't think we have consensus on this topic beyond a common understanding of the issues of the different solutions and the will to explore different options in order to find the ones that work the best.

anssiko commented 7 years ago

We'll seek Chromium engineers' consensus position on the jank concern @slightlyoff noted in the context of the Ambient Light Sensor TAG review from Mar 2016 at https://github.com/w3ctag/design-reviews/issues/115#issuecomment-236365671, and will report back. There has been new information brought to the table since that concern was raised.

tobie commented 7 years ago

Can we make sure to have these conversations here, please?

pozdnyakov commented 7 years ago

@slightlyoff do you think the solution described at https://github.com/w3c/sensors/issues/198#issuecomment-304635062 (decoupling onchange from rAF at the moment) is acceptable, given the arguments at https://bugs.chromium.org/p/chromium/issues/detail?id=715872#c10 ?