Allowing data batching when poll frequency < sensor frequency

tobie commented 9 years ago

Even when doing realtime processing (eg. say at 60 Hz), there is a benefit to having more than one new value every time you process (eg. Oculus polls a gyroscope at 1000 Hz for head tracking). In this case, every time you process, you'll have [1000/60] ~= 16 new sensor data.

This seems like a desirable feature. How would you handle this though?

Is this (yet) an extra option you set? How does that work with sensors with multiple values, etc.?

/cc @borismus

Proposed resolutions: Some use cases have high data frequency requirements which might cause performance and/or memory constraints. Further research is needed to assess and determine how to satisfy those requirements.

Further actions:

[ ] List use cases with high data frequency requirements.
[ ] Research the performance costs of crossing the JS context to justify batching requirements (talk with implementors).
[ ] Decide on a solution to expose the batching in the Sensor API.
[ ] Devise a strategy to handle huge backlog of historical data.
[ ] Decide whether performance and/or memory constraints justify making the batch mode opt-in or whether it can just be the normal operational mode.

borismus commented 9 years ago

I described a possible approach at the end of my blog post:

// Handle sensor events.
function onMagnetometer(event) {
  var data = event.data[0];
  // Get the timestamp (in millis).
  var t = data.timestamp;
  // Get the data (in this case µT, as per spec).
  var x = data.values[0];
  var y = data.values[1];
  var z = data.values[2];
  // Process the data.
  superAdvancedSensorFusionThing.addData(t, x, y, z);
}

tobie commented 9 years ago

OK, so have event.data always be a reverse-chronological array of sensor values (all values since the previous event, I suppose?). Isn't that unnecessarily wasteful in case only the last value is used/needed? Also, do you only expose the last value on the sensor object itself? Or do you also expose an array of values there? Or nothing at all?

rwaldron commented 9 years ago

I don't understand how the example illustrates an approach that matches the title of this issue.

As a side note, I'm curious to know what magnetometer device (or any sensor, really) actually supports data register history. Generally, the IC will write to the data register at whatever frequency it's been programmed to do so, but if the data isn't read before the next register write, it will be overwritten by that write and no longer accessible.

borismus commented 9 years ago

@tobie Yes in this snippet event.data is reverse-chronological array of sensor values since the previous event. Are you saying it's wasteful to have an array with one value in it?

Perhaps there are two modes here: batched (you get all data since the last callback), and sampled (you get only one data point).

@rwaldron Do you need IC support for this? A native thread can poll at sensor rate and cache the values, exposing batches of data to the JS context. My concern is that for very fast sensors (ie. sensor frequency >> 60 Hz), entering JS for every single value is going to really harm performance.

tobie commented 9 years ago

Yes in this snippet event.data is reverse-chronological array of sensor values since the previous event. Are you saying it's wasteful to have an array with one value in it?

No, I was saying that passing around an array with all the batched data for the cases where only the current value was needed seemed wasteful.

Perhaps there are two modes here: batched (you get all data since the last callback), and sampled (you get only one data point).

As a constructor option? (And maybe even .value and .values distinction with only the latter returning an array in such cases?) Could work if the use cases justify this added complexity.

rwaldron commented 9 years ago

Do you need IC support for this?

No, that's not what I meant—that's way too low level

cache the values

Got it, this is interesting and I agree with the motivation. A constructor options param could switch on a "batch" feature where the event object delivered to change events would contain a Set of readings (readings can be single or multiple value data) since the last change event.

tobie commented 9 years ago

There seems to be solid use cases and broad agreement to provide batches of sensor values when the frequency of onchange events is below the frequency at which the data is emitted by the sensor (for example, onchange events emitted at 60 Hz and a sensor emitting values at 1,000 Hz). This seems to be the good compromise between the need for fine grained data and perf requirements.

There are various means to expose these data batches but none that jump out as being the obvious, most elegant and efficient way to do so. There's also the possibility of making that an opt-in process for performance reasons.

Proposed resolutions: A batch mode must be specified. The use cases justifying said batch mode must be documented.

Further actions: understand the performance constraints better to see if this batch mode should be opt-in or always on.

rwaldron commented 9 years ago

(for example, onchange events emitted at 60 Hz and a sensor emitting values at 1,000 Hz)

So for this example, there would be at least ~17 values between each ondata(or whatever it's called) event. A 200 Hz accelerometer would have ~3 values between each ondata. Is it smart to expose a feature that will inevitably lead to code that does something like this:

let { Accelerometer } = sensors;
let accel = new Accelerometer({ frequency: 60, batch: true });
accel.on("data", function(event) {
  // For illustration only...
  // "event.batch" is an array of objects containing frozen "snapshots" of 
  // all the readings in the last 16.666666ms
  event.batch.length; // 17 for a 1000 Hz sensor
  event.batch.length; // 3 for a 200 Hz sensor
});

Ok, that's fine right? But what happens when it's an onchange event and the user sets the phone down on a table and has the screen sleep set to 30 minutes? 29 minutes later, they pick up the phone, which triggers the onchange which now delivers an array containing 104400000 (1000ms * 60s * 60m * 29m) frozen snapshot objects.

This will lock up your browser

var batch = [];
for (var i = 0; i < 104400000; i++) {
  batch[i] = {x: 1, y: 2, z: 3};
}
console.log(batch);

Another issue is that developers may write code that's somehow dependent on the number of records in a batch for the ondata event, eg. 17 for the 60 Hz reporting of the 1000 Hz sensor, or 3 for the 60 Hz reporting of the 200 Hz sensor.

Unless the batching period is only for the last period (16.66--ms in our example)?

What if there is no batching included, but design in a way that allows later introduction if there is demand. I suggest this because batching could be done in user code:

let { Accelerometer } = sensors;
let accel = new Accelerometer(); // Important!! This will default to the sensor's read resolution
let batch = [];
let start, end;
accel.on("data", function(event) {
  if (batch.length === 0) {
    start = event.timestamp;
    end = start + (1000 / 60);
  }
  if (event.timestamp < end) {
    // I can pick what information is useful to me...
    batch.push(this.acceleration);
  } else {
    doSomethingWithBatch(batch.slice());
    batch.length = 0;
  }
});

If this becomes common, we can extend the api to alleviate the pain.

tobie commented 9 years ago

My understanding from https://github.com/w3c/sensors/issues/9#issuecomment-63326257 is that a batch mode is necessary for performance reasons (due to the cost of re-entering the JS context every time the sensor emits a value). This implies that the batching is done in native code and then transferred to the JS context all at once. Now whether that batched data is then emitted all together in a single ondata event (e.g. sensor.emit("data", batch);) or separately in n different events (e.g. batch.forEach(value => sensor.emit("data", value));) can be debated, though there might also be performance considerations there.

rwaldron commented 9 years ago

This implies that the batching is done in native code and then transferred to the JS context all at once.

Yes, no dispute.

My concern still stands: what if the batch is 104400000 accelerometer snapshot objects?

tobie commented 9 years ago

My concern still stands: what if the batch is 104400000 accelerometer snapshot objects?

There are various solutions to that I guess, e.g. a maximum configurable amount of historical sensor data stored, as @borismus suggested.

tobie commented 9 years ago

Updated the proposed resolution and next actions following your comments, @rwaldron:

Proposed resolutions: Some use cases have high data frequency requirements which might cause performance and/or memory constraints. Further research is needed to assess and determine how to satisfy those requirements.

Further actions:

List use cases with high data frequency requirements.
Research the performance costs of crossing the JS context to justify batching requirements (talk with implementors).
Decide on a solution to expose the batching in the Sensor API.
Devise a strategy to handle huge backlog of historical data.
Decide whether performance and/or memory constraints justify making the batch mode opt-in or whether it can just be the normal operational mode.

rwaldron commented 9 years ago

This is great, thanks for the follow up :)

robman commented 5 years ago

Heya - I can see this has been added as a Level 2 issue and just wondering if there's any progress on this?

It seems like all these issues are related to this issue (#13) :

98
89
63
42

And possibly:

12
106

Unfortunately the periodic (monotonic) feature/discussion that @tobie referred to in #98 seems to have disappeared - https://github.com/w3c/sensors/issues/98#issuecomment-206515815

There's quite a few use cases that would benefit from having access to the full data set of readings provided by the Accelerometer. But at the moment extra load on the main browser thread (e.g. 3D rendering) can block onreading() updates and once they're gone there's no way to recover the lost readings at all.

One well known use case where the full set of readings is important is for inertial navigation style algorithms (as discussed in #98). @tobie you asked a couple of questions in this comment https://github.com/w3c/sensors/issues/98#issuecomment-279981550

The sampling intervals need to be consistent. And it's critical that no samples are lost. 
[Ed: I'd love to understand more about about why that's important.]

and

[Ed: it would be useful to get a better understanding of the kind of analysis that's being done on the readings to justify the high sampling rates requirements and the necessity of not dropping samples and have an idea of the batch sizes.]

For Inertial Navigation, SLAM/VIO and a range of other use cases the accelerometer data readings are integrated so their values add up over time to update the current velocity which is then combined with the time slice to define a motion vector. Happy to provide pointers to algorithms here if you want more detail.

This approach has it's own problems because it's a double integral so it's well known that over time errors build up and drift can become a major issue.

There are a lot of ways to work around this double integral problem, but if you don't have all of the readings because some are just missing then this approach is not usable at all.

All of the discussions in the issues I listed above kinda dance around this point - but none seem to have completely covered it 8)

It would be great to hear if there has been any recent movement on any of this?

anssiko commented 5 years ago

@robman, thanks for your comments. I put this issue on our upcoming F2F agenda for discussion.

robman commented 5 years ago

awse - thanks @anssiko

tangobravo commented 1 year ago

@anssiko - I know your comment was now a few years ago, but can you remember if there was any movement on this at the F2F?

As @robman explained really well, I think for me the main issue is missing readings if the main thread was too busy for a period of time. With WebGL on the main thread then things like loading model files or uploading large textures can block the main thread for hundreds of milliseconds.

Another case to consider is if your regular WebGL render function takes 20ms or so, you end up regularly missing readings in that case.

Exposing sensors to Workers might allow a better QoS for these cases, but having a dedicated worker for it seems like overkill if a batch delivery / recent reading history API could be added to the spec.

On iOS Safari, the DeviceMotionEvent and DeviceOrientationEvents are queued for each reading, so if the main thread is busy you'll still get all the callbacks fired when the thread frees up again. If the rate is going to stay limited for privacy reasons to 60Hz then just adding the data into the events and queueing them would be a fine solution.

Otherwise a rolling buffer of the latest readings accessible from the the Sensor instance would cover all of these "main thread too busy" issues. The buffer wouldn't need to be huge so I don't think the memory concerns would be a problem in practice - 200 samples or so should be more than enough for most temporary disturbances.

anssiko commented 1 year ago

@tangobravo the WG made the following resolution at the 2019 F2F:

RESOLVED: Allowing data batching when poll frequency < sensor frequency #13 maintains its Level 2 status, awaits use cases

Here's the pointer: https://www.w3.org/2019/09/19-dap-minutes.html#x07

If you're interested, feel free to help document the use cases in a concise form. A synthesis of what is discussed in this issue and new ones not yet discussed. I can't promise that'll move the needle, but it'll surely help explain the feature to folks such as implementers, web developers, as well as privacy researchers whose input is important to get things moving.

w3c / sensors

Allowing data batching when poll frequency < sensor frequency #13

98

89

63

42

12

106