w3c / mediacapture-depth

the Media Capture Depth Stream Extensions specification
https://w3c.github.io/mediacapture-depth/
Other
24 stars 20 forks source link

Setting unit in constraints #100

Closed dontcallmedom closed 8 years ago

dontcallmedom commented 8 years ago

The way constraints for mediastreamtrack have been defined is to allow browser-mediated negotiations of allowable ranges in which an application can operate on the said track.

Using this to indicate the unit of the reported values seems a poor application of that concept, since there is no particular reason why one would negotiate between meters and milimeters (or at least, I can't see what reason); in general, I'm not sure there is any value in saving the developers from a division or multiplication by a fixed number (1000) here.

huningxin commented 8 years ago

@dontcallmedom , thanks for opening this issue!

From what I know, some depth cameras allow to set depth value unit. It allows applications to configure the depth camera precision and accuracy.

As it is device specific currently, I am fine to remove it from constraints. But it still makes sense to keep it in settings. WDYT? @dontcallmedom @anssiko

anssiko commented 8 years ago

This seems similar to the enableHighAccuracy of Geolocation API.

Suggestion: rename to something else than unit (reuse geo's name?), standardize on a single unit and keep in settings only. Imlementations to do the conversion internally.

jan-ivar commented 8 years ago

@anssiko +1, though maybe just highAccuracy since this is a constrainable property.

Definitely standardize on one unit, as a constrainable global unit might mess up values in another tab!

jan-ivar commented 8 years ago

Actually, since highAccuracy would default to true, we should use enum to "avoid having true as default value".

How about depthPrecision = "mm"?

anssiko commented 8 years ago

Thanks @dontcallmedom, @jan-ivar, and @huningxin for your feedback.

I've rolled in your feedback into PR #111, but still have couple of open issues re depth precision looking for further feedback and suggestions, namely:

var gotten = navigator.mediaDevices.getUserMedia({depth: {
  depthPrecision: "mm" // this should be a hint
}});

If the above cannot be done with the constraints model, we'd need an alternative way for the web developer to figure out whether the getUserMedia() invocation with certain depthPrecision might fail beforehand. So we get to the second issue:

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["highDepthPrecision"]) {
  // high depth precision supported \o/
}

As the last resort, we could nix the depthPrecision constraint entirely.

Feedback? I'd like to address either one of the above issues and amend the PR accordingly before landing. Or nix the problematic constraint and leave it in settings only.

anssiko commented 8 years ago

(Pinged the ML for feedback: https://lists.w3.org/Archives/Public/public-media-capture/2016Mar/0079.html)

alvestrand commented 8 years ago

Personal opinion: I think naming precisions as "mm" or "m" is extremely user-unfriendly, and will lead to very many situations that are unclear to the user. The only thing setting the depth map measurement unit buys you is the ability to write "100.0" instead of "0.1" when setting "near" to 10 cm - and that is just not something I see as valuable. Doubles have precision to spare.

I see no use in setting precision to "mm" or "m" - if you want to have a precision, set it as a double.

jan-ivar commented 8 years ago

The only thing setting the depth map measurement unit buys you is the ability to write "100.0" instead of "0.1" when setting "near" to 10 cm - and that is just not something I see as valuable.

If that's true then I agree, however, I got the impression that these were two different modes that the hardware could be in, sort of like a focal length thing?

alvestrand commented 8 years ago

If there's a specific piece of hardware with 2 modes, you can bet $0.02 that tomorrow there'll be a new version with four modes. Let's not box ourselves in.

jan-ivar commented 8 years ago

It's an enum, which can be expanded?

anssiko commented 8 years ago

@alvestrand In abstract depthPrecision is similar to enableHighAccuracy, but used to configure the depth camera precision and accuracy rather than geolocation accuracy. Given the confusion I'm leaning toward dropping this constrain altogether from v1, expose as a readonly property via getSettings() only.

jan-ivar commented 8 years ago

All constraints are optional by default, so no need for a special case here. The code you write above will be interpreted as optional. You'd have to write depthPrecision: { exact: "mm" } for it to be mandatory.

anssiko commented 8 years ago

@jan-ivar Thanks, I was confused about the constraints model -- your clarification addresses my "always an optional constraint" issue. How about the capability detection based on constraint value?

jan-ivar commented 8 years ago

@anssiko getSupportedConstraints is a global method that just tells you what constraint names you can ask your browser about, it says nothing about your device. It exists solely to handle arcane "mandatory must fail" cases, since by default dictionary members a browser hasn't heard of before get ignored.

jan-ivar commented 8 years ago

@anssiko you are correct that you can't peek at what hardware the user has until after the user has granted cam or mic permission.

Before permission has been granted, the closest you can get is:

let noSupport;
if (navigator.mediaDevices.getSupportedConstraints()["depthPrecision"]) {
  getUserMedia({ depth: { depthPrecision: { exact: "mm" } })
  .then(stream => { /* got camera, yay... */ })
  .catch(e => {
    if (e.name == "OverconstrainedError" && e.constraint == "depthPrecision") {
      noSupport = true;
    }
  });
}

If the user has no depth camera, then noSupport will be true almost immediately without prompting the user. If the user has a depth camera, on the other hand, then the user just got prompted for access. This is by design (for fingerprinting reasons).

jan-ivar commented 8 years ago

I don't see how making it readonly sidesteps much. If we're going to have it at all, then we should get it right to begin with imho.

alvestrand commented 8 years ago

@anssiko enableHighAccuracy is also (in my opinion) a bad choice - it tells the platform that the app thinks it knows what model device it is dealing with, and that it therefore knows what "high accuracy" means. If you have an app that will only work when accuracy is better than 1%, there should be an "accuracy" constraint you can constrain to smaller than 0.01 (or larger than 100 if you choose the opposite representation).

Once you have a track, getCapabilities() will give you the range of values available from that source. There is no cross-device getCapabilities() function.

anssiko commented 8 years ago

This is all great input. Concrete proposals how to spec this the right way wanted. The first stab is #111.

jan-ivar commented 8 years ago

Well, we do have http://w3c.github.io/mediacapture-main/getusermedia.html#widl-InputDeviceInfo-getCapabilities-MediaTrackCapabilities

martinthomson commented 8 years ago

@alvestrand, I agree wholly about enableHighAccuracy. That is and was a bad idea. But in this case, I think that what you want to say is "how" accurate the depth measurements are. That is something quantifiable, measurable and if it is, then you should be able to say what the cutoff is. If you go there, then you have to define what it means to be more accurate than a given threshold, which is harder than you think. Just because you can report in 0.001m increments, that doesn't mean that you are that accurate.

(The term used in metrology is uncertainty, and if you want a primer, I'd recommend the other GUM)

[ISO.GUM]  ISO/IEC, "Guide to the expression of uncertainty in
              measurement (GUM)", Guide 98:1995, 1995.
alvestrand commented 8 years ago

@jan-ivar yes, that's what getCapabilities() returns.

@martinthomson great reference! It's even publicly available: http://www.bipm.org/en/publications/guides/gum.html

anssiko commented 8 years ago

I glimpsed the other GUM through.

I think what we want to do is be pragmatic and re-evaluate how the relevant native APIs that ship widely express uncertainty of the measurement, and take an intersection of that as a starting point for our feature discussion.

@huningxin How do Kinect/Tango/RealSense native APIs express uncertainty?

jan-ivar commented 8 years ago

(not my domain but) it seems to me this is not just about accuracy, but also about range, right? I.e. I naively imagine I'd probably want a different setting for pointing my kinect out my window vs. pointing it at my couch?

martinthomson commented 8 years ago

@jan-ivar, that would be a separate constraint, probably two. minimumRange: 1, maximumRange: 100. If that's a real concern, then a separate issue. For this one, I think that @anssiko has the right approach: find out how do the devices actually measure and work from there.

@anssiko, if there are devices that measure with their accuracy dependent on distance (that seems totally plausible) and others that aren't distance-dependent, then you have an interesting problem to solve in the constraints space. You could have separate "accuracy@1m" and "accuracy@10m" constraints; or "accuracy@10m" plus "accuracyIsDistanceDependent":boolean, or you could define what distance accuracy is reported at and accept that some devices are better or worse depending on distance.

As for representation; we should ensure that the representation is consistent. We support i420 and RGB for video without burdening applications with the distinction between the two. In the browser itself, those two might be carried differently, but when they hit a screen or API, they are converted. In terms of the "pixels" that are exposed, I would hope that depth cameras surface the same measurement regardless of type.

Update: After reviewing the spec, I see that there is an active unit. I don't think that's a good idea, for the above reasons, I'll open a separate issue.

huningxin commented 8 years ago

@jan-ivar , Kinect SDK allows to configure depth camera range (Default or Near) [1]. libealsense [2] allows to configure several ranges to RealSense front facing depth camera as well (Common, ShortRange, LongRange etc.,). I am not aware Tango SDK allows to configure working range of its depth camera.

@anssiko , as I know, Kinect exposes TooFarDepth, TooNearDepth and UnknownDepth properties to represent depth value in low confidence [3]. RealSense SDK provides the QueryDepthLowConfidenceValue API [4]. Tango SDK just mentions that "There may be empty sections in the grid without a depth value. These are denoted in the IJ array by a value of -1." in their TangoXYZij structure [5].

@martinthomson , +1 for a consistent depth value representation. I am writing down the several native depth value formats, for your reference:

Kinect Tango RealSense
13/16-bits unsigned short
millimeters [6]
32-bit float
meters (same as PCL) [7]
16 bits unsigned short
millimeters (default) [8]
anssiko commented 8 years ago

@huningxin Thanks for the information on various native APIs, this is very helpful!

@jan-ivar I opened a new issue #114 to discuss the range constraint. You have good use cases, and the data provided by @huningxin seems to suggest we might be able to find an intersection that is implementable across various native APIs.

@martinthomson I'll defer to @huningxin to comment on the feasibility of implementing "accuracy@1m" constraint across native platforms. Note we also lose accuracy in the lossy conversion from depth to 8-bit grayscale representation and back that should be accounted to too. I created a Google sheet to demonstrate the error of measurement from this conversion with some example data, see:

@martinthomson I followed up on the consistent depth value representation in #113.

anssiko commented 8 years ago

Re "accuracy@1m" constraint:

Looking at the Kinect, Tango, RealSense implementation references provided by @huningxin it seems the intersection of the known implementations is the following:

Given this, it appears it would be impossible to implement "accuracy@1m" constraint. What we could do, is to expose the error of measurement due to the lossy depth data conversion, if deemed useful (see the example data). My hunch is web developers would expect "accuracy" to bake in all uncertainty components be it due to hardware limitation, lossy conversion, etc.

Suggested resolution:

We (as in group) try to find better names for the DepthPrecisionEnum enum and its values as well as for the depthPrecision dictionary member (because naming discussions are fun!).

Current proposal staged in PR #111 is the following (see it in context):

enum DepthPrecisionEnum {
    "mm",
    "m"
};

The "mm" value indicates the depth precision is millimeters, the "m" value indicates the depth precision is meters.

I agree precision is not the best name and is misuse of the formal definition, and the descriptions are not particularly clear either.

Please let us know your name suggestions. I'll update #111 when the consensus emerges, land it, and we can close this issue (and continue range constraint discussion in #114 that branched from here).

Thank you for your continued contributions and feedback.

jan-ivar commented 8 years ago

I was trying to simplify, not complicate; I obviously failed. Just trying to comprehend how it works. Take Kinect: 13 bits of depth data where each pixel value is from 0 to 8192 millimeters means a range of 82 meters, correct?

That tells me the only time I need to worry about switching from the default is if I'm pointing the device out my window, tracking cars or airplanes, at which point I'm probably OK with pixel values incrementing in units of meters rather than millimeters, in return for my range boosting up to 8192 meters, correct? Someone who knows this stuff please cut me off.

If this is near industry standard, then we should perhaps heed that before we overabstract.

jan-ivar commented 8 years ago

Importantly, this bit of information also seems important to interpret the bitmap correctly.

anssiko commented 8 years ago

@jan-ivar You're being very helpful and asking the right questions. The range in your example using 13 bits and 1 mm precision would be 8.192 meters, or 8192 millimeters.

The reason why we cannot expose the native bit depth directly if greater than 8 bits is that an essential part of the pipeline, canvas, is limited to 8 bits per channel. Thus for bit depths greater than 8 bits we use range inverse conversion that uses more bits closer to the camera (thus smaller error closer to the cam) before quantifying to 8 bits, see mediacapture-depth error of measurement from conversion. This works in a similar fashion as z-buffer in GPUs. Since the depth measurements farther away will likely have larger error, less bits can be used for representing those values farther away from the cam.

This raises a question whether we should make the conversion (inverse or linear) developer configurable, and not conditional to the native bit depth of the depth map, provided by the native API.

112 explains how to interpret the bitmap.

The introduction section has some more context on the known Web platform limitations. In a worker environment these limitation can be overcome (see mediacapture-worker).

martinthomson commented 8 years ago

13 bits of depth data where each pixel value is from 0 to 8192 millimeters means a range of 82 meters, correct?

That is what might, in other contexts, be called dynamic range (the range of values that can be expressed). It has no bearing on the accuracy of the actual measurements.

Does the value have to be expressed as an 8-bit number?

anssiko commented 8 years ago

@martinthomson If we want to play nice with the video element (to allow developers to easily visualize depth map in grayscale) and canvas (to allow manipulation of depth data using its APIs) without extending/patching them significantly we're limited to 8 bits. Alternative (more invasive) approach would be to e.g. encode the 16-bit depth value into the 3 8-bit channels (RGB). @huningxin discussed some further alternatives in #11.

The proposal to extend canvas with support for 16-bit depth/image format was not well received by the community responsible for the canvas spec, so we had to back off from that.

martinthomson commented 8 years ago

I guess that we can leave improvements in that area for future work.

It suggests value in min and max range constraints. With only 256 steps available, I can imagine adjusting the near and far values to get greater resolution for the features that are important.

Given the typical near/far range on cameras, 256 steps is worse than 1cm (maybe 1/2 inch). Which isn't that good if the actual underlying measurement units are any indication. Allowing an app to focus on a narrower range might improve quality without loosening the constraints that you describe.

I suppose you could get creative and use colour channels to encode more bits, but I imagine that would be hard to process. (for instance, imagine if Cr was always reported in mm or less, regardless of the range available to Y).

anssiko commented 8 years ago

@martinthomson Agree on the importance of near/far range configurability -- let's continue discuss the specifics of that in #114. Using the range inverse conversion, we get very good accuracy close to the near range, so if the range is adjustable the resolution can be improved where it matters, on a use case basis.

That said, let's test how the following resonates with people:

Suggested resolution (updated):

This allows the web developer to adjust the range (if the depth camera in use supports that, not common today AFAIK), and by doing so improve the resolution within the defined range, especially close to its near range. The expectation is that the web developer knows what is the optimal range for the use case at hand.

Seeking feedback from @huningxin @martinthomson @jan-ivar @alvestrand @dontcallmedom, and anyone else who feels like chiming in.

jan-ivar commented 8 years ago

(I've clearly been in the US for too long when I confuse centimeters and millimeters...)

@anssiko thanks for the detailed explanation! This looks good to me, though in full disclosure, beyond playing Fruit Ninja, my hands-on depth camera experience is limited.

huningxin commented 8 years ago

Suggested resolution (updated):

Drop the depthPrecision constraint and associated DepthPrecisionEnum that are hard to spec right.

It makes sense to me.

Make depthNear and depthFar constraints instead (currently exposed via getSettings() only).

It looks pretty good! Normally, depth camera only provides limited options of range setting, like 2 options for Kinect (default, near), and 3 options for RealSense R200 (common, shortRange, longRange). It would be good to expose the range via getCapabilities API. User agent may translate the native range settings to {near: number, far: number} arrays, like for Kinect [1], (if we use millimeters).

{
  depthRange: [ {near: 800, far:  4000}, {near: 400, far: 3000} ]
}

What do you think?

martinthomson commented 8 years ago
  1. use metres
  2. the form is { depthNear: { min: 0, max: 2 }, depthFar: { ideal: 5 } }
  3. we should be able to use the constraints in two ways:
    • to pick the camera mode that best fits, and
    • to adjust the range over which the value is converted to the 8-bit value
alvestrand commented 8 years ago

Adjusting near and far as constraints seems good to me. The main purpose of near and far is driving the conversion algorithm.

The camera driver would then use the settings on the particular device that make the most sense with those constraints - ie it would set "shortRange" on the RealSense if a "far" of less than (say) 1m was acceptable to the user.

In a later version, I would think that adding a way to get something deeper than an 8-bit map makes sense - since most video codecs are growing 10, 12 and even 14 bit modes these days, there will be other people clamoring for such an option too. But that's a different topic.

anssiko commented 8 years ago

All - thanks the the great feedback. With your help we finally get to close this issue with the following resolution that reflect the consensus position of the group.

Resolution:

To clarify, this issue will be now closed and the depthNear and depthFar constraints will be discussed in a separate issue #114.