Batching API for sensor readings

pozdnyakov commented 7 years ago

Batching API shall be used for collecting sensor readings that are lost between onchange calls in case sensor polling frequency is higher than the current frame rate. This API is already a subtopic at #98, but being itself a rather big item it deserves a dedicated issue. So please use this issue to discuss the use cases and proposals for sensor readings batch API.

pozdnyakov commented 7 years ago

I would like to propose the following approach:

interface MotionSensor : Sensor {
  /**
   * Populates the passed array with the latest readings recorded right before the next
   * repaint.
   * 
   * The array length must be equal to Reading Size * N, where N is the desired number
   * of readings to collect.
   * Note: calling of this method can be expensive as it might require starting of a
   * high frequency timer.
   */
   void collectLatestReadings(Float64Array array);
}

example usage:

let gyro = Gyroscope({frequency: 240});

const readingSize = 4; // tymestamp, x, y, z
let buffer = Float64Array(readingSize * 4);
gyro.start();
gyro.collectLatestReadings(buffer);

function step(timestamp) {
  // Now 'buffer' contains recorded readings.
  gyro.collectLatestReadings(buffer); // Collect further.
  window.requestAnimationFrame(step);
}

window.requestAnimationFrame(step);

Also I would drop 'onchnage' event for motion sensors as it does not bring much benefits there and on the other hand it is called too often draining CPU and battery.

pozdnyakov commented 7 years ago

@tobie @anssiko PTAL

tobie commented 7 years ago

Thanks for getting us started on this key conversation.

let gyro = Gyroscope({frequency: 240});

const readingSize = 4; // timestamp, x, y, z
let buffer = Float64Array(readingSize * 4);
gyro.start();
gyro.collectLatestReadings(buffer);

function step(timestamp) {
  // Now 'buffer' contains recorded readings.
  gyro.collectLatestReadings(buffer); // Collect further.
  window.requestAnimationFrame(step);
}

window.requestAnimationFrame(step);

We want to make sure the API design doesn't force the implementation to preemptively store all intermediary samples in case collectLatestReadings gets called in the following animation frame.

I think that's what you're suggesting here, but that's unclear from the API name alone. It's something we need to express clearly through the API names. Worth checking other BYOB APIs on the Web for ideas here.

What the use cases point towards so far is an API where the consumer can:

Provide a buffer of the right size. Note it might be interesting to have a way to programmatically know the buffer size you need for a given number of readings, e.g.: let buffer = Float64Array(gyro.getReadingSize() * n);
Know when the buffer has been updated.
Maybe know when the buffer is full (promise or event-based? During rAF or as soon as it's full?).
Possibly decide on a policy to handle what happens once the buffer is full (FIFO, FILO, throw, swap buffers, other?).
Read from the buffer.

Also I would drop 'onchnage' event for motion sensors as it does not bring much benefits there and on the other hand it is called too often draining CPU and battery.

That requires removing them from Sensor too and thus creating an extra NonMotionSensor class inheriting from Sensor. Not necessarily a bad thing, but worth considering.

An other option would be to clearly distinguish between polling and reporting frequencies and throttle the latter to animation frame. We really need to understand what the use cases are for this, though.

pozdnyakov commented 7 years ago

We want to make sure the API design doesn't force the implementation to preemptively store all intermediary samples in case collectLatestReadings gets called in the following animation frame.

I thought that if the actual amount of readings exceeds the given buffer capacity, the older records will be dropped and newer will be still added, so that buffer contains always the latest recorded readings

tobie commented 7 years ago

We want to make sure the API design doesn't force the implementation to preemptively store all intermediary samples in case collectLatestReadings gets called in the following animation frame.

I thought that if the actual amount of readings exceeds the given buffer capacity, the older records will be dropped and newer will be still added, so that buffer contains always the latest recorded readings

That's not what I meant. Let me try and clarify.

The API name suggested above gives the impression you'll be able to fill up the buffer with existing readings pre-stored somewhere and read from it in the same event turn.

I thought that if the actual amount of readings exceeds the given buffer capacity, the older records will be dropped and newer will be still added, so that buffer contains always the latest recorded readings

That said, that wouldn't meet the Navisens use case for which dropping samples is a deal breaker.

pozdnyakov commented 7 years ago

That said, that wouldn't meet the Navisens use case for which dropping samples is a deal breaker.

Agree, this use cased won't be met: long enough array could solve this problem for samples reported while an animation frame is being painted, but anyway some samples could lost in between of collectLatestReadings calls.

To meet this use case we need an uninterrupted recording all the time, note that samples can be lost even during notification that the given buffer is full and while refreshing the buffer.

tobie commented 7 years ago

To meet this use case we need an uninterrupted recording all the time, note that samples can be lost even during notification that the given buffer is full and while refreshing the buffer.

Would you write to the buffer directly as you get the samples from the HAL (or equivalent) or would you write to it with each animation frame?

If its the former, we indeed need a buffer swapping mechanism of some sort. If it's the latter the buffer can be swapped synchronously by the API consumer during rAF (or right before? I'm not sure about the particulars of this) if they're advised the buffer is about to be filled, no?

let gyro = new Gyroscope();
let bufferA = Float64Array(gyro.getReadingSize() * n);
let bufferB = Float64Array(gyro.getReadingSize() * n);
gyro.onbufferfull = _ => {
  if (gyro.buffer === bufferA) {
    gyro.buffer = bufferB;
  } else {
    gyro.buffer = bufferA;
  }
};

pozdnyakov commented 7 years ago

I was planning to have an internal data structure that is being fulfilled periodically (using a timer) with N latest readings starting from the moment when collectLatestReadings() is called and till the moment when next frame paint is scheduled (i.e. right before rAF callbacks are invoked) and in the later moment the internal data structure contents would be copied to the given Float64Array. Thus client has a notion of previous readings obtained with the frequency higher than the current frame rate.

In Chromium sensor data is fetched from the underlying system in one process and then it is put to a shared memory buffer. JS runs within a different process.

Considering the fact that JS engine can arbitrary pause or defer scripts execution I doubt that we can completely avoid sample loosing whatever API we invent.

tobie commented 7 years ago

I was planning to have an internal data structure that is being fulfilled periodically (using a timer) with N latest readings starting from the moment when collectLatestReadings() is called and till the moment when next frame paint is scheduled (i.e. right before rAF callbacks are invoked) and in the later moment the internal data structure contents would be copied to the given Float64Array. Thus client has a notion of previous readings obtained with the frequency higher than the current frame rate.

That seems perfectly reasonable.

Would the buffer get overwritten each turn? Or just used once?

Would you handle the use case of collecting large amounts of data (e.g. 500 samples at 120 Hz) using the same API?

If so would we fire specific events when the buffer is full? What would happen to extra samples? Would they stay in your data structure until a new buffer came along?

In Chromium sensor data is fetched from the underlying system in one process and then it is put to a shared memory buffer. JS runs within a different process.

Considering the fact that JS engine can arbitrary pause or defer scripts execution I doubt that we can completely avoid sample loosing whatever API we invent.

I guess in practice that depends on the size of the underlying shared buffer, no? I think if we can limit this to being really rare and/or fire readinglost events (but would we even know we lost data?), that would be neat.

pozdnyakov commented 7 years ago

Would the buffer get overwritten each turn? Or just used once?

Only once per collectLatestReadings() call (this is gonna be an expensive method so IMO it's better to call it explicitly on every frame)

Would you handle the use case of collecting large amounts of data (e.g. 500 samples at 120 Hz) using the same API?

The given array will contain only the readings recorded while a single frame painting, meaning that for 120 Hz polling frequency (and assuming 60 fps) space for 2 readings should suffice.

If so would we fire specific events when the buffer is full? What would happen to extra samples? Would they stay in your data structure until a new buffer came along?

Inside the internal data structure the oldest readings will be replaced by the new ones, so if the given buffer is shorter than needed it will contain the latest readings and the older ones will be lost.

Any new event will be delivered in a different call chain (at an undefined future moment) which means that some samples will be lost in meanwhile.. Instead, maybe we could require array size equal to [READING_SIZE * COUNT+1] and the last element will be used as a flag indicating overflow, so that the user can allocate a bigger buffer for the next time.

I guess in practice that depends on the size of the underlying shared buffer, no? I think if we can limit this to being really rare and/or fire readinglost events (but would we even know we lost data?), that would be neat.

JS is not aware if it got suspended or deferred so there is always a likelyhood of lost samples

tobie commented 7 years ago

The given array will contain only the readings recorded while a single frame painting, meaning that for 120 Hz polling frequency (and assuming 60 fps) space for 2 readings should suffice.

What happens when we fall below 60 Hz, though? I think we need to account for that in how we size buffers (or the underlying data structure used to populate them.

Instead, maybe we could require array size equal to [READING_SIZE * COUNT+1] and the last element will be used as a flag indicating overflow, so that the user can allocate a bigger buffer for the next time.

I imagine that if we can write this info to the consumer's buffer, we can equally write it somewhere else and use that info to fire a "bufferoverflow" event, no? Seems more web-y. Events are synchronous, so the consumer would be informed within the same AF.

Would you handle the use case of collecting large amounts of data (e.g. 500 samples at 120 Hz) using the same API?

The given array will contain only the readings recorded while a single frame painting, meaning that for 120 Hz polling frequency (and assuming 60 fps) space for 2 readings should suffice.

So that doesn't fit our Navisens use case. Would be interesting to thing about it separately, maybe even in terms of a ServiceWorker-based API, that would emit events a much longer intervals, once a large buffer would be full.

pozdnyakov commented 7 years ago

The previous proposal looks a bit clumsy and not sufficient for some use cases (https://github.com/w3c/sensors/issues/171#issuecomment-282032472) , so another approach:

interface MotionSensor : Sensor {
  /**
   * Populates the passed array with the sensor readings.
   *
   * The returned promise is resolved when the buffer gets full or when sensor stops.
   *
   * The returned promise is rejected if sensor wasn't yet started or if an error has
   * occurred.
   */
   Promise<> populateBuffer(Float64Array array);
}

For implementing this we could start a new thread collecting readings in background until the required amount of data is collected.

@tobie, PTAL

pozdnyakov commented 7 years ago

Also, as for onchnage event removal, we could have

interface EnvironmentalSensor : Sensor {
attribute EventHandler onchange;
}

interface AmbientLightSensor : EnvironmentalSensor {
}

tobie commented 7 years ago

As mentioned before, I think we need to do a bit more research in use cases, requirements and how similar cases are handled across the platform before we write down APIs.

kenchris commented 7 years ago

Did anyone look at whether the new SharedArrayBuffer might make sense? https://github.com/tc39/ecmascript_sharedmem/blob/master/TUTORIAL.md

tobie commented 7 years ago

Did anyone look at whether the new SharedArrayBuffer might make sense?

Vaguely. I don't think it matches our use case at all, but it's probably easy to prove me wrong.

pozdnyakov commented 7 years ago

Recently I studied the possible ways how to implement reliable delivery of sensor readings received at the rate higher than the current frame rate (https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/35QSwrRazKY ).

From what I learned, the only feasible way is to collect readings and make batches on platform side (in the same context where these readings were obtained) and then send the fulfilled batches to JS side consequently and making sure nothing is missed. This approach implies a significant latency.

On the other hand the existing Sensor API is designed to fetch the most recent data with minimal latency, so it looks quite problematic to integrate both approaches in the same API.

This means that batching API should be split from the existing Sensor API (onchange + getters) as these two will be implemented differently: we need either to extend the Sensor interface with new methods or maybe it is better to have a separate SensorWatcher interface.

tobie commented 7 years ago

Just to clarify, it seems, from the use cases gathered so far, that we're fine getting those batches delivered on rAF. How would that increase latency of the most freshest of these samples compared to the current situation?

Basically, what we want to support is polling at a greater frequency than the frame rate, not reporting this data at a greater frequency than the frame rate. (Or at least that's the plan given the technical constraints and the use cases gathered so far.)

pozdnyakov commented 7 years ago

Just to clarify, it seems, from the use cases gathered so far, that we're fine getting those batches delivered on rAF. How would that increase latency of the most freshest of these samples compared to the current situation?

Right, we propagate data in sync with rAF in any case, but currently we propagate the freshest data immediately obtained from platform side (via shared buffer) - very low latency. In case of batch API we'll propagate a batch that had been collected on platform side and send using a different IPC mechanism (which provides better data synchronization and integrity) some time ago - here latency turns up.

tobie commented 7 years ago

In case of batch API we'll propagate a batch that had been collected on platform side and send using a different IPC mechanism (which provides better data synchronization and integrity) some time ago - here latency turns up.

I see. Care to elaborate on the distinctions between the two IPC mechanism and their respective limitations?

Basically, why can't we just get a larger shared buffer? To collect maybe 4 samples instead of 1?

pozdnyakov commented 7 years ago

Basically, why can't we just get a larger shared buffer? To collect maybe 4 samples instead of 1?

We can. Than we'll always provide 4 latest readings, but if frame rate goes down for some reason and for example 6 readings are missed (i.e. not propagated) than only latest four will be delivered and previous two lost. Is it acceptable?

pozdnyakov commented 7 years ago

Care to elaborate on the distinctions between the two IPC mechanism and their respective limitations?

In best case scenario the extra latency is 2ms (on Nexus), but it can increase if the context running JS slows down for some reason and suspends handling of the received batches.

tobie commented 7 years ago

We can. Than we'll always provide 4 latest readings, but if frame rate goes down for some reason and for example 6 readings are missed (i.e. not propagated) than only latest four will be delivered and previous two lost. Is it acceptable?

Well, that's an interesting tradeoff I frankly don't have a good answer to until we have clearer use cases and requirements.

How large can such a buffer be? i.e. can you get some extra space to account for say the 95% percentile of frame delays?

Going back to one of your previous comments, I agree we might also want another API that trades off latency over integrity, perhaps for bigger sample sets.

The requirements I'm roughly seeing so far are (take those with a grain of salt for now):

latest sample, low latency.
latest few samples, low latency, ok to loose some samples.
batch of samples, important not to loose any, latency not so much of an issue.

I'm wondering whether (1) and (2) could be combined in a single API, though (2) requires you somehow integrate timestamps in the output which (1) can (possibly) handle differently.

tobie commented 7 years ago

In best case scenario the extra latency is 2ms (on Nexus), but it can increase if the context running JS slows down for some reason and suspends handling of the received batches.

Yikes! We certainly don't want to ruin latency for VR by doing this. Thanks for these data points, that's super useful.

pozdnyakov commented 7 years ago

I'm wondering whether (1) and (2) could be combined in a single API,

yes, I do not see any issues from the implementation side, but we should figure out the required amount of latest readings to keep and explain it somehow (i.e. why 4 and not 40 :-) )

tobie commented 7 years ago

yes, I do not see any issues from the implementation side, but we should figure out the required amount of latest readings to keep and explain it somehow (i.e. why 4 and not 40 :-) )

From an implementor's perspective, would it make sense and be useful to have this be author-settable (and capped to a max value, obviously)?

pozdnyakov commented 7 years ago

Would it make sense to have this be author-settable (and capped to a max value, obviously)?

it might be hard to find a good estimation, could try to base it on 60Hz assuming it as frame rate, but the actual frame rate varies.

pozdnyakov commented 7 years ago

another thing is that we have multiple sensor instances using the same buffer infra and new instances can arbitrary appear in runtime, so the buffer size should not be something settable by the user.

tobie commented 7 years ago

Sorry, edited my comment in the meantime, which is generally a bad idea.

I'm making the (perhaps incorrect) assumption that authors (god—I hate that name) would be in the right position to know the number of readings required to run whatever computation they want to in each new frame.

tobie commented 7 years ago

another thing is that have multiple sensor instances using the same buffer infra and new instances can arbitrary appear in runtime, so the buffer size should no t be something settable by the user.

I guess this is a risk. Couldn't this still let you optimize for the most common cases (e.g. perhaps default to only one sample) and swap for a bigger buffer if/when the need arises?

pozdnyakov commented 7 years ago

I guess this is a risk. Couldn't this still let you optimize for the most common cases (e.g. perhaps default to only sample) and swap for a bigger buffer if/when the need arises?

I'd prefer simplicity :-) meaning allocate enough memory beforehand, a single reading takes 48 bytes I don't think it's gonna be an issue if we allocate space for instance for 4 or 5 readings from the beginning.

pozdnyakov commented 7 years ago

btw the missed readings (if any) could be the arguments for onchange event

tobie commented 7 years ago

I'd prefer simplicity :-) meaning allocate enough memory beforehand, a single reading takes 48 bytes I don't think it's gonna be an issue if we allocate space for instance for 4 or 5 readings from the beginning.

OK - I don't have a good sense of scale for this considering multiple origins, multiple sensors, and potentially higher frequencies in the future.

tobie commented 7 years ago

btw the missed readings (if any) could be the arguments for onchange event

I'm not 100% sure what you mean by this, but I assume you're suggesting we need an event-base (or promised-based as iirc you suggested elsewhere) system for case (3) above and a rAF-based solution for (1) and (2) with no events at all.

If that's what you mean, then I think we're in perfect agreement.

Note we might still want to channel (3) in the rAF (to avoid the critical path), but that would be for perf reasons only.

pozdnyakov commented 7 years ago

If that's what you mean, then I think we're in perfect agreement.

Right. cool :)

Note we might still want to channel (3) in the rAF (to avoid the critical path), but that would be for perf reasons only.

yes, any (frequent) notification should be in sync with rAF whatever it contains. Thanks for the discussion!

tobie commented 7 years ago

Likewise.

I think the next steps here are to validate some of the assumptions we've made above. @kenchris, we need your help with this. And then start sketching an API that would work for this.

With regard to the latter, I think the consensus at this point is that we're aiming for a lower-level, probably less user-friendly, but more powerful API for this stuff.

kenchris commented 7 years ago

OK, let me read up on this. A bit busy with other stuff :)

kenchris commented 7 years ago

Could you share some of the use-cases for option 3?

Giving the problem with knowing which sensors will be used and having sensors appear at runtime, it might make sense to make this less flexible for users, as most users will know before hand what sensors they are going to use during their app live cycle.

When do you set up the size of the shared memory? I guess we could replace it when adding new sensors and just document that adding new sensors may affect data delivery.

Btw [slightly off topic], something which is not clear from API point of view. Say I want to use a RelativeOrientationSensor and a Magnetometer to calibrate the heading every 15 seconds so save power. Right now it is not clear to me whether I should set the frequency very low for the Magnetometer or turn it on/off every 15. Like will I actually save power from the magnetometer or not?

kenchris commented 7 years ago

If we end up dropping data, it would make sense to send timestamp difference from last delivered reading, plus a sequence number. This is for instance used by the Daydream controller which uses Bluetooth and can lose readings.

tobie commented 7 years ago

Could you share some of the use-cases for option 3?

Basically anything that involves storing data for later analysis, or near real time, but on remote servers (e.g. for AI). See notably https://github.com/w3c/sensors/issues/98#issuecomment-279981550.

Giving the problem with knowing which sensors will be used and having sensors appear at runtime, it might make sense to make this less flexible for users, as most users will know before hand what sensors they are going to use during their app live cycle.

When do you set up the size of the shared memory? I guess we could replace it when adding new sensors and just document that adding new sensors may affect data delivery.

This optimization seems not to be worth pursuing from an implementation perspective.

Btw [slightly off topic], something which is not clear from API point of view. Say I want to use a RelativeOrientationSensor and a Magnetometer to calibrate the heading every 15 seconds so save power. Right now it is not clear to me whether I should set the frequency very low for the Magnetometer or turn it on/off every 15. Like will I actually save power from the magnetometer or not?

This will probably stay implementation specific. Setting the magnetometer to a low frequency should be your best bet, here, as it let's the UA/OS optimize when to poll. Might be worth opening a dedicated issue to discuss this further.

If we end up dropping data, it would make sense to send timestamp difference from last delivered reading, plus a sequence number. This is for instance used by the Daydream controller which uses Bluetooth and can lose readings.

Can you elaborate a bit on this topic, and also think about how we could represent this in the API? Would it be info at the back of the buffer? A dedicated prop on the object? An event?

kenchris commented 7 years ago

Can you elaborate a bit on this topic, and also think about how we could represent this in the API? Would it be info at the back of the buffer? A dedicated prop on the object? An event?

It is per event data, so it should be part of the buffer. Also, for any protocol based on Bluetooth or USB, it will most likely be part of the buffer as well.

tobie commented 7 years ago

It is per event data, so it should be part of the buffer.

Well, given the "event" is a rAF, it could as well sit as a prop on the object which changes on every rAF.

kenchris commented 7 years ago

Well you wanted to have more than one data set per reading, thus, if 6 data sets exist since the last reading and can only store 4, then I want the sequence for each of those latter 4

tobie commented 7 years ago

Well you wanted to have more than one data set per reading, thus, if 6 data sets exist since the last reading and can only store 4, then I want the sequence for each of those latter 4

Oh! Sequence numbers are per reading. Duh.

So when batched, a quaternion would hold 6 floats then. The 4 values, a timestamp/timedelta and a sequence number.

pozdnyakov commented 7 years ago

@tobie maybe it's worth reanimating SensorReading? then we could operate with sequence of reading instances..

tobie commented 7 years ago

@tobie maybe it's worth reanimating SensorReading? then we could operate with sequence of reading instances.

I was imagining that an approach that would allow BYOB would be much more sound. It also seems we were converging on TypedArrays for other reasons.

As expressed in https://github.com/w3c/sensors/issues/153#issuecomment-287915334 (which I re-opened), I'm still fairly unconvinced by the testing done so far that would prevent us from triggering GC pauses in RL scenarios. The GC sweep times mentioned in the related blink-dev thread is also a concern.

pozdnyakov commented 7 years ago

In another blink-dev thread that I started to figure out the implementation approach exactly for sensors batch API I was guided to PointerEvent.getcoalescedevents API returning sequence of objects. We could do smth similar with sequence<SensorReading> for batching..

I understand the performance concerns, but on the other hand, considering that the user cannot predict the required size of the buffer BYOB concept might be too complex here. Maybe we could start with a user-friendly API and optimize it if needed? It's usually harder to make an "optimized" API more user-friendly than doing the opposite :-)

pozdnyakov commented 7 years ago

btw I'm not proposing to reconsider the existing sensor getters, SensorReading could be present only in batch API which is supposed to be used in less performance-sensitive scenarios.

tobie commented 7 years ago

btw I'm not proposing to reconsider the existing sensor getters, SensorReading could be present only in batch API which is supposed to be used in less performance-sensitive scenarios.

Oh! That's interesting.

tobie commented 7 years ago

On the call we had today, @pozdnyakov suggested we completely drop the API that collects the latest few readings while still keeping low latency. That's our option 2 described in https://github.com/w3c/sensors/issues/171#issuecomment-288044722, which relies on collecting multiple readings in a single buffer.

iirc we had use cases and requirements for this API, but I wasn't sure what they were precisely. @kenchris, @rwaldron, do you have any in mind or is it OK to limit the number of readings exposed in realtime to a maximum of 1 reading per rAF?

w3c / sensors

Batching API for sensor readings #171