Making the `<model>` element compatible with modern web frameworks.

trusktr commented 1 year ago

Hello! I first wrote about this here.

https://twitter.com/trusktr/status/1717795902272974942

I want to open a conversation on how we could make the <model> element more compatible with today's web frameworks, and how currently it is not.

[!Note] I work on Lume, HTML elements for 3D rendering, and I have been spent a lot of time imagining what native 3D elements in browsers could be like. Try viewing and hovering on elements in the devtools element inspector when view examples.

I'll re-post what I wrote on Twitter here, updated and improved a little:

@apple's HTML <model> element in #visionOS Safari is incredibly neat (see the video, at 12:36), hinting at an awesome future of high-level 3D web capabilities that don't require intimate programming knowledge to use.

It however does not hint at a full HTML 3D future compatible with React, Vue, Svelte, Solid, and all the other DOM tools we've evolved over the years. Let me explaine why, and perhaps we can fix this.

<model> enables amazing native features for web in a simple way, and starts to paint an awesome future for web, but I believe we can plan our HTML future better based on the present (cc @jensimmons as the curator of our #webdev needs for @WebKit, and @marcoscaceres as editor of the spec):

The current model.getCamera() API returns a non-HTMLElement plain-JavaScript object, which means all of our DOM-manipulation tools (@buildWithLit, @reactjs, @vuejs, @sveltejs, @solid_js, and more) will not be able to manipulate a <model> element's camera out of the box using their declarative-reactive templating systems, requiring a non-standard new way of programming interactivity specifically for <model>. Web devs will build unnecessary wrappers (for every framework!) to bridge this gap, in order to tie <model> camera interactivity back into our paradigms.

As an example of how unnecessary wrappers would be, react-three-fiber wraps Three.js to add the declarative-reactive support for Three.js in React because Three.js is not DOM-based (this is not unnecessary for Three.js, but <model> aims to be a DOM API and makes part of its API necessarily require non-DOM wrappers). The wrappers that people make for <model> will need to map attributes/properties from their templates to the <model>'s non-DOM camera objects.

With HTML elements (and well-designed Custom Elements), there is no need for wrappers to be made for every framework, all elements typically work out of the box in any web framework (there are a few edge cases, but we've learned to avoid those in design of new HTML elements).

The following HTML interface is an example that would be compatible with the DOM future all frameworks have been building:

<model ...>
  <camera type="perspective" position="10 20 30"></camera>
</model>

This is compatible because by making the camera configurable via DOM with HTML support, every web framework can manipulate it out of the box (and not just on the client side, but also on the server side!).

Alternatively, camera attributes on the <model> element would also be a compatible pattern, and could later be superceded with HTML elements that override the attribute behavior:

<model camera="perspective" camera-position="10 20 30" etc></model>

These designs are compatible with all of today's frameworks because they embrace the DOM tree model that all DOM tools are designed to work with.

The getCamera() API, though extremely nifty for what it can do, is not compatible because it side-steps the DOM trees that these tools manipulate, which will make our code more WET (less DRY) due to those wrappers we'll need for every framework to connect to <model> element features (cameras). If the API were to give us elements and attributes (which elements, and which attributes, can be debated), we'd get these connections concisely and for free.

Imagine <model> comes out in all browsers, and out-of-the-box anyone using any framework can manipulate all aspects of a <model> scene right away without escaping out of their framework to write custom JavaScript. With a DOM-based API, the Promise return would no longer be necessary for getCamera(), as these plain JS objects will be controlled while their rendering effects are abstracted underneath. The camera DOM can internally proxy to an underlying renderer (out of process or not).

Today, most frameworks practice one-way data flow (except in edge cases as with <input>s, etc), but the getCamera() API, besides not being compatible with DOM tools, is a more cumbersome two-way data flow design:

First, the user has to get a Promise and wait for it. With DOM API, users never have to wait for sub-objects (children) to be ready by using Promises returned by the browser. For scenarios involving elements that load assets (f.e. <img>), browsers fire a load event when the element has loaded an asset or assets.
Then the user can start to interact with a camera object but not in a standard way (JS only, no compat with tools).

Instead, with a DOM-centric approach, users would be able to simply map their desired state to the DOM (whether attribtues on <model> or a new element like <camera>). The user does not need to know when the underlying internal camera is ready for rendering, just like the user does not need to know when a <select> is ready for rendering: they simply map their state to the DOM, and they expect the display to update. load events can be emitted to tell users when assets are ready. Promise is not necessary.

The current getCamera() essentially creates a new and alternative object model that requires new tooling for all frameworks, yet does not provide a clear benefit that would make this tradeoff worth it.

We can also get creative here, while not veering away from the really great web developer paradigms we already have:

In a further future (and based on my experience making Lume, and knowledge of A-Frame, Trois for Vue, Threlte for Svelte, react-three-fiber for React, and other similar libraries for various frameworks), <camera> and <model> can be decoupled from each other, used in a 3D <scene>, implicitly disabling the <model>'s camera and instead rendering relative to a user-defined <scene> with custom graphics interspersed with any number of <model> elements:

<scene fog="linear" fog-color="pink" ...>
  <camera type="perspective" position="..."></camera>
  <model rotation="..." ...></model>
  <model scale="..." ...></model>
  <box ...></box>
  <sphere ...></sphere>
</scene>

Such a scene as above is fully compatible with web tools that have evolved over so many years of thought.

Changing the <model> API would be easy, but breaking apps would be extremely difficult. This is why we need to imagine, right now: how do we ensure that the future brings us elements that immediately fit right in with the tools we've worked hard to create?

I would love to help ideate this future. I've started to ideate over at https://lume.io.

I would love to continue this conversation to ensure that we can create a set of 3D elements that meet the needs of the vast majority of web developers in the simplest way possible with highest compatibility with the existing ecosystem.

marcoscaceres commented 1 year ago

It's definitely worth exploring the design of the API (e.g., #12 already raised the issue around the promises and #41 and a ton of other issues).

I think the problem here is that this is conflating what the format provides and what model provides. The <model> element, as it stands, it just a simple wrapper for some 3D format (USDZ or gLTF) so that a 3D file can be rendered on a page.

In designing an API, what we need to figure out is exactly what overlap there is between the two formats in accessing the common internals (e.g., they may both have an initial camera position defined in the format itself). We can't really do much more than that, I think... but that's still allows for a lot of stuff, as model-viewer has shown.

I don't think we want to end up with <box ...></box> <sphere ...></sphere> etc. We'd basically be recreating VRML. USD and gLTF already specify all the 3D object/world information, as well as textures, etc. To do that again in HTML would be recreating the wheel, which few definitely don't want to do.

I guess what would be nice with existing things on the Web is if there was some way to simply light a scene using environmental lighting information and present it in a stereoscopic view (like what VisionOS does), but take away the ability for the page to capture the canvas (for privacy, as the UA would be reflecting the user's environment).

I don't know if that's possible/feasible, but it seems that's what you'd want here: to continue to support all the legacy content, but just have it be rendered/lit correctly in the new context.

At the same time, it's nice to just put a <model> on the page, just like one does with an <img> tag and have it do simple things... for anything more fancy, there is WebGPU, WebGL, WebXR, Canvas etc.

zachernuk commented 7 months ago

I totally agree about establishing some manner of composing and manipulating model contents - I'm just conscious that the more that goes into an initial specification, the harder it becomes to agree upon it and the more implementation work must be done before it's complete. To that end, in the interests of finding the absolute MVP, I'm suggesting that we defer any scene inspection until a post-V1. @trusktr Would you be okay to close this?

immersive-web / model-element

Making the `<model>` element compatible with modern web frameworks. #73