immersive-web / real-world-geometry

Additions to WebXR Device API for exposing real world data (Lead: Piotr Bialecki). Plane detection: https://immersive-web.github.io/real-world-geometry/plane-detection.html
Other
72 stars 14 forks source link

Comments on the planes-explainer #2

Open blairmacintyre opened 5 years ago

blairmacintyre commented 5 years ago

A few high level comments, rather than doing detailed editing.

1) Is the intent of the having objects per "kind of world knowledge" be to allow relatively complex configuration? For example, we could eventually pass in significant configuration for things like object and image trackers/detectors? (this seems reasonable)

2) I'm less keen on having the planes be a field in returned value; this implies that any kind of info will be in it's own area. Why aren't planes a kind of mesh? If all things that are meshs are meshes, then an app that cares only about meshes (for occlusion, for example, or drawing pretty effects on anything known in the world) can just deal with that.

meshes.forEach(mesh => {
   let pose = mesh.getPose(xrReferenceSpace); // origin of mesh or plane
   if (mesh.isPlane) {
       // mesh.polygon is an array of vertices defining the boundary of 2D polygon
       // containing x,y,z coordinates
       let planeVertices = mesh.polygon; 
      // ...draw plane_vertices relative to pose...
   } else {
      // draw more general mesh that isn't a special kind of mesh
      // the plane mesh would also have 
      let vertices = mesh.vertices  // vertices for the mesh.  For planes, might be the same
                    // as mesh.polygon plus one at the origin
      let triangles = mesh.triangles // the triangles using the vertices

      // ... draw mesh relative to post
   }
 });

3) I agree with synchronous, but why are things only valid in the rAF? As your example shows, the first thing people will do is copy it. This seems massively wasteful for non-trivial amounts of world info. Why not just say it's valid through the start of the next rAF?

4) why is this pull instead of push? Forcing folks to poll and step through the data to see if anything has changed seems awkward. In our polyfill we used two things:

bialpio commented 5 years ago

Thanks for the comments!

  1. The idea here was to be flexible and allow in the future for other kinds of “world knowledge” that might not be related to geometry. We’d like to return something that’s extensible with other kinds of RW information like lighting estimations / occlusion data, or meshes, but in a way that doesn’t require those types to be mutually exclusive or dependent on each other. That way each type can have its own configuration options (as you point out) but also can allow feature detection (as not all systems will support all types of real world knowledge).

  2. See above, not all things that are related to world knowledge are meshes (although in this repo we’ll focus on RWG). Meshes are definitely less redundant (when they are available) but we feel they need to be separate detectable features to allow flexibility in the future.

    There is also a question of confidence differences between different types of real-world data. For example ARKit and ARCore may have different confidence requirements for returning a mesh or a plane. If we say we can represent a plane as mesh that is kind of true but not really since an accurate reflection of the data returned by ArKit/ArCore would require attaching metadata to the mesh for things like plane center/ extents or even surface normal. Which basically puts us kind of back in the situation of having multiple types of data.

  3. Good question. The primary assumption with this design is that the geometry will be used for drawing, and drawing is only allowed in rAF callback. We discussed this internally and while some use cases (e.g. using planes for occlusion, or rendering planes) are explicitly related to rendering, even non-rendering use cases (e.g. scanning a room) are likely to provide a visual indicator of state, which effectively makes them rendering use cases.

    We also recalled some of the conversations around hit testing which considered linking geometry results to the camera frame (on visual SLAM systems), and there were concerns at that time that the geometry could too easily be out of sync with the rendering and could cause jittering or other artifacts.

    If the app still needs to use the information after rAF, it can do so by copying the data. My examples were only doing so for demonstration purposes - just to have variables that can be passed in to other functions from rAF callback. I’ll modify them to make it more clear.

  4. Polling is proposed in hit-test explainer so I’ve decided to follow suit. I do think that adding an updated property on objects in the returned list will be useful - this is why we’d like to incubate the API and check where it’s awkward to use. Once there’s such property, it will be trivial to add a list with just the updated objects.

    • Events could work as well as long as we could guarantee that the subscriber is notified in the context that could draw, otherwise apps would have to maintain a list of all objects themselves and keep updating it. Instead of requiring users of the API to do so, we might just do it for them.
    • Touched on that a bit above, I think difference lists are a good idea. Just exposing an updated property should allow us to polyfill it.
blairmacintyre commented 5 years ago

The idea here was to be flexible and allow in the future for other kinds of “world knowledge” that might not be related to geometry. We’d like to return something that’s extensible with other kinds of RW information like lighting estimations / occlusion data, or meshes, but in a way that doesn’t require those types to be mutually exclusive or dependent on each other. That way each type can have its own configuration options (as you point out) but also can allow feature detection (as not all systems will support all types of real world knowledge).

I would agree with those goals. Good.

See above, not all things that are related to world knowledge are meshes (although in this repo we’ll focus on RWG). Meshes are definitely less redundant (when they are available) but we feel they need to be separate detectable features to allow flexibility in the future.

That's orthogonal to my point. Obviously there will be world knowledge that has nothing to do with meshes.

But the idea of having a "planes" member, with a convex boundary, is too specific. In this case, for example, I would posit it will be obsolete before this proposal is finished -- it's pretty easy to imagine that some time soon planes (in ARKit and/or ARCore) will support arbitrary concave boundaries, with holes in them. The current planes are woefully inadequate.

My suggestion to have (when the underlying object is compatible, obviously) a generic mesh type, which has a relatively simple representation, but can have additional fields based on what it actually represents, has all the advantages of having a specific "plane" field, with a bunch of other advantages. As we move toward additional capabilities (e.g., world meshes like on Hololens/ML1), they can be represented similarly; future things (like segmenting moving objects out of a static world) will also fit. But for applications that DON'T know (or care) about these new types, they can fall back to just using the mesh, if they want.

There is also a question of confidence differences between different types of real-world data. For example ARKit and ARCore may have different confidence requirements for returning a mesh or a plane. If we say we can represent a plane as mesh that is kind of true but not really since an accurate reflection of the data returned by ArKit/ArCore would require attaching metadata to the mesh for things like plane center/ extents or even surface normal. Which basically puts us kind of back in the situation of having multiple types of data.

You are right we need additional metadata (that's what I proposed, right?), but adding it absolutely does not put us back in the same place. If I don't know what a "segmentedObject" or (in this case) "plane" or some future "concavePlane" is, but they are all "mesh" with the required "mesh" data provided, I can use them.

I'm very concerned about evolution and headway over time. Having simple convex planes with a simplistic geometric boundary as a base data type doesn't really work.

Good question. The primary assumption with this design is that the geometry will be used for drawing, and drawing is only allowed in rAF callback.

It will also be used for physics, for example. And there are already demonstrations of using workers for physics; there have been demonstrations of drawing in workers, but Spectre/Meltdown put a stop to that (by forcing vendors to eliminate the necessary shared memory, I seem to recall).

And when people what to do analysis of the geometry (e.g., to detect planes or other objects), that will be very amenable to being done in workers.

We discussed this internally and while some use cases (e.g. using planes for occlusion, or rendering planes) are explicitly related to rendering, even non-rendering use cases (e.g. scanning a room) are likely to provide a visual indicator of state, which effectively makes them rendering use cases.

Rendering is the obvious use, I agree. But limiting data use arbitrarily just because of this seems like a poor choice. Is there are good reason to do it, aside from this?

We also recalled some of the conversations around hit testing which considered linking geometry results to the camera frame (on visual SLAM systems), and there were concerns at that time that the geometry could too easily be out of sync with the rendering and could cause jittering or other artifacts.

I agree that we absolutely need to know the current state of the geometry in the rAF. I was questioning the "not valid outside the rAF" point.

If the app still needs to use the information after rAF, it can do so by copying the data. My examples were only doing so for demonstration purposes - just to have variables that can be passed in to other functions from rAF callback. I’ll modify them to make it more clear.

But, my question still stands: is there a reason to require this? Yes, if I am passing data to a worker, I very likely will copy it, but I'd actually prefer to do this in the worker, not in the main thread, for performance reasons.

What we actually need is the opposite: a way to pin the object until you are done with it, and then release it so it can be reused internally.

These data structures may get very big. As I walk around my house with a Hololens on, or ML1, the size of the world geometry continues to grow, and future devices will generate finer and finer meshes. We cannot design and API that requires this data be copied each frame for non-trivial use cases.

Polling is proposed in hit-test explainer so I’ve decided to follow suit.

Is that the only reason? I agree that following other parts of the API is good, without a reason not to (which is why I pointed at the Anchor API explainer).

But, here, as the amount of data grows, forcing pages to iterate and compare seems unwise. Again, consider ARKit and ARCore, but also Hololens and ML1, as I walk around my house, going from room to room and floor to floor, only the local parts of the data are changing. Other distant parts do not change. Having applications perform at a consistent rate as the amount of data available to them grows requires that we avoid forcing them to scan through all the data more than they have to.

I do think that adding an updated property on objects in the returned list will be useful - this is why we’d like to incubate the API and check where it’s awkward to use.

That's why I pointed to what we've been using for a year with the WebXR Viewer and our older proposed version of WebXR. :)

It may very well be that having a "updated" property on the objects is sufficient, especially if the objects have unique IDs associated with them (or are otherwise identifiable). It really depends on how "big" the list gets. A few dozen, or even a few hundred, is probably fine; after all, drawing them will be more expensive than peeking at a single property on each of them. I mostly want a way to know which objects have changed.

Touched on that a bit above, I think difference lists are a good idea. Just exposing an updated property should allow us to polyfill it.

Agreed, that's probably a good compromise.

blairmacintyre commented 5 years ago

@johnpallett @bialpio comments welcome. I am probably going to start fiddling with an implementation later today (I hope)

johnpallett commented 5 years ago

This is a great discussion, thanks @blairmacintyre for the analysis.

To make life easier, I'm breaking this issue into several smaller topics to make things easier to track and organize.

AFAICT it looks like we have several areas of consensus, and three topics still being discussed. For those, I've created #4 #5 and #6. I've copied text over from here to those issues, so I think this issue can be closed in lieu of those more focused topics, @blairmacintyre can you confirm?

blairmacintyre commented 5 years ago

@johnpallett can you summarize what areas of consensus you saw? :)

johnpallett commented 5 years ago

@blairmacintyre apologies for the long delay in getting back to you on this. You're absolutely right that we should capture the areas of agreement in this thread. I see two specifically:

1) Agreement on a synchronous API. For context, synchronous is also supported in the hit testing explainer.

2) Agreement on the goal of an extensible API that allows for other types of world knowledge (e.g. lighting estimation, occlusion data, meshes) without making the types mutually exclusive or dependent on each other.

@blairmacintyre did I capture these correctly?

Also, are there any other unresolved topics in this thread that we should create new issues for? If so (or if you disagree with either of the two conclusions above) feedback is very welcome. Otherwise, I'd suggest we close this issue. WDYT?