Is it time for browsers to standardize 3D rendering?

As the maintainer of <model-viewer>, I'm humbled to have Apple referencing it in a web standards proposal. I've had a number of conversations now in various standards bodies about the <model> proposal, as well as various internal conversations at Google about whether we should propose something similar in Chrome going back at least three years. I figured I should summarize those conversations here publicly to stimulate further discussion.

As much as it would have been good for my career to push <model-viewer> into Chrome and the standards process, I have instead argued against it because I think would hinder innovation in what is currently a rapidly-evolving field. I'll list out some pros and cons below of standardizing a <model> element vs. using a JS library like <model-viewer>, SketchFab, babylon.js, etc. Please add comments with any pros and cons I've missed, as well as discussion of those mentioned.

Pro: I'm just going to quote the only pro given in the explainer:

Do not add a new element. Pass enough data to WebGL to render accurately As noted above, this would require any site that wants to use an AR experience to request and have the user trust that site enough to allow them access to the camera stream as well as other information. A new element allows this use case without requiring the user to make that decision.

First, this is largely false. AR within the browser today is accomplished via the WebXR standard (which iOS Safari has not implemented) and it was explicitly developed with privacy in mind. WebXR in fact works without giving the website access to the camera feed, hence the distinction between the XR permission and the camera permission. It does give access to the camera pose in order make canvas rendering possible, but all of this has gone through numerous rounds of privacy and security review. Even the precision of available data is capped to limit fingerprinting.

Con: Device/browser compatibility & consistency. The various JS libraries for 3D give uniform rendering and universal support for the file formats and extensions of their choice across devices and browsers today (including Safari). And when they implement a new extension, it is available on all browsers simultaneously. The only exception is AR QuickLook on iOS, which has neither the format support, nor the customizability, to achieve rendering consistency, which is constantly noticed by our users. First, <model> appears excited to follow the debacle of the <audio> tag regarding format support across browsers. However, even if a format was agreed upon, I would love to hear the plan for keeping extension support and rendering quality consistent across browsers over time. This is a rapidly-evolving field; Khronos has been releasing several new PBR extension per year for some time, and that looks unlikely to slow. There is more competition between JS libraries than between browsers because the cost of switching is so much lower; the last thing we want to do is hand an innovative field over to a duopoly.

Con: Scale of the API to standardize. The current <model> API proposal is deceptively simple. This may be because it is so focused on the AR use case and proposes to also solve 3D-in-the-DOM as a side-effect. glTF's usage across e-commerce has demonstrated clearly that while AR has some great niche value, 3D-in-the-DOM is actually the dominant use case. And it requires a lot more customization than AR, especially around camera controls, limits, interactions, and prompts. You can get a taste of the critical APIs <model> is currently missing here. Nevermind the arbitrary choices like model framing, movement behaviors, etc that Apple proposes be left up to browsers to create totally inconsistent experiences.

The bigger problem I foresee though, is requirements creep. I know this well from maintaining MV; I am constantly pushed to expose more and more of the underlying three.js API. I resist in order to keep my product differentiated at a higher level of abstraction, but it's a very fuzzy line. By natively supporting a 3D model in the browser, I predict no one will be satisfied until a Unity-sized API has been web-standardized around it. This is the same problem VRML ran into decades ago. Standards bodies are powerful, but slow - I fear to think how many years it would take to agree on a standard so complex.

In conclusion, I would say Apple's use case can be well solved with today's JS rendering libraries if they simply add WebXR support to iOS. Even if that privacy barrier is somehow insurmountable, they could also propose a standard way to launch native AR experiences from the browser without the need to standardize a new DOM element, which would keep the proposal much simpler, but sadly without any JS-based customization opportunity.

@elalish wrote:

Nevermind the arbitrary choices like model framing, movement behaviors, etc that Apple proposes be left up to browsers to create totally inconsistent experiences.

I think that this inconsistency would be a significant blocker for adoption in the 3D-in-DOM use case. Given the choice between using a JS library with predictable results[1] versus using a tag with varying semantics across browsers, I think that the tag solution would need to offer very significant benefits to make that appealing. If I'm understanding it right, the main benefit would be a slightly smoother transition to AR mode, i.e. not having to re-download the model and maintaining some state from the current view.

WebXR leaves full flexibility for the 3D-in-DOM use case while also allowing a smooth transition to AR mode.

[1] Rendering bugs can potentially affect results. Libraries can often find workarounds for those and generally have a faster release cycle than browsers, and it's much harder to compensate for issues if rendering is done directly by the browser short of avoiding use of features entirely.

Thanks @elalish for articulating these concerns. This is extremely useful feedback, and precisely the kind of input we were hoping for! 🙏

...I think would hinder innovation in what is currently a rapidly-evolving field.

Just acknowledging that this is a valid concern and certainly no one wants to stifle innovation in this space. Quite the opposite: the proposal attempts to lower the barrier of entry for anyone wanting to put 3D content on the web by not requiring the inclusion of a JS library.

However, <model> doesn't preclude the inclusion of any library or continued innovations of those JS libraries.

Con: Scale of the API to standardize. This may be because it is so focused on the AR use case and proposes to also solve 3D-in-the-DOM as a side-effect.

It's a bit unfortunate if the proposal was read that way. The primary focus has been on 3D-in-the-DOM. The ability to enter into an AR view is a complementary feature (but a highly desirable one). That an “AR browser” could allow viewing models in place, while being completely privacy-preserving, without requiring any special permissions is also a huge plus - and a key differentiator from WebXR.

glTF's usage across e-commerce has demonstrated clearly that while AR has some great niche value, 3D-in-the-DOM is actually the dominant use case.

Agree that is by far the dominant use case, and should be a driver of what functionality we need to specify.

And it requires a lot more customization than AR, especially around camera controls, limits, interactions, and prompts.

You are right in the above respects. The proposed API surface has been kept small just to see if we can agree on a minimal amount of functionality.

You can get a taste of the critical APIs <model> is currently missing here.

This is good input. And yeah, looking at that list, and as we start to play with actual content, we can see a need for analog functionality that model-viewer provides. But, that's totally fine though. The point of this incubation exercise is to figure out what a model element would need - it was never to suggest that what we proposed was everything we need or in any way feature complete. Put differently: we are “incubating”, not “standardizing” at this point.

nevermind the arbitrary choices like model framing, movement behaviors, etc that Apple proposes be left up to browsers to create totally inconsistent experiences.

As above, please be mindful that it's just a proposal: having consistent framing, movement, and so on if definitely a goal and things that we, as a group, need to figure out (i.e., it's not because Apple believe it should be left to each browser, we just didn't have answers for what it should be and want to figure that out together in the incubation process).

The bigger problem I foresee though, is requirements creep. ... By natively supporting a 3D model in the browser, I predict no one will be satisfied until a Unity-sized API has been web-standardized around it.

I assure you standards folks are well aware and prepared for this problem: we know how to say “no” a lot, and both the W3C and WHATWG have robust processes / working modes in place to prevent such requirement creep (particularly given of the hard requirement of multiple implementation commitment before anything is included in an actual standard!). This is also why we are starting with barely nothing in the spec: the intent is to set a really high-bar for inclusion of any new feature/requirement.

This is the same problem VRML ran into decades ago. Standards bodies are powerful, but slow - I fear to think how many years it would take to agree on a standard so complex.

Historically this hasn’t been the case at the W3C or WHATWG (VRML being an exception of a bygone era): we don’t need a ratified standard before browsers ship things. So, although it may take years to get a “W3C Recommendation”, all browser vendors can ship interoperable implementations way before that. Case in point: WebXR, which reached Candidate Recommendation status in 3 years.

Surely if WebXR can be mostly standardized in that timeframe then so can <model>? They are comparable in complexity, wouldn't you say? Same goes for any feature we've added to browsers in an interoperable manner across browsers the last 15+ years. Yes, standards are hard, complex, and take time, but that's not a reason to not do them - that's what makes them worthwhile endeavors.

In conclusion, I would say Apple's use case can be well solved with today's JS rendering libraries if they simply add WebXR support to iOS.

WebXR is in development though I have no information about support in iOS. Nevertheless we'd like to make adding a 3D models to a page easier if possible, with <model>.

@klausw wrote:

I think that this inconsistency would be a significant blocker for adoption in the 3D-in-DOM use case.

Agree. Let's figure those out.

It sounds like Safari will use MacOS/iOS built-in 3d renderer for rendering <model>.

If that's the case, how will <model> be rendered in other WebKit-based browsers such as GNOME Web? Would <model> only display if the host OS has a 3D renderer or will WebKit ship with a 3D renderer built-in?

It sounds like Safari will use MacOS/iOS built-in 3d renderer for rendering <model>.

It uses a built-in renderer today, yes.

If that's the case, how will be rendered in other WebKit-based browsers such as GNOME Web?

We need to figure that out with that community (i.e., why we are having this incubation process).

Would <model> only display if the host OS has a 3D renderer or will WebKit ship with a 3D renderer built-in?

I honestly can't predict the future, sorry (i.e. I honestly don't know... part of this incubation process is to figure out the feasibility of that).

@marcoscaceres Thanks, those are all fair points. Agreed that standardizing this is possible, I'm more making the point that it'll be costly. That's fine if there's adequate benefit. Thus far I haven't heard any users complain that putting a 3D model on their site with <model-viewer> was not easy enough. Do you have some specifics to share on how <model> will make 3D easier for web devs?

Likewise, can you share exactly what the privacy blockers are for WebXR, since it's not actually the camera feed? I'd also love to know more about the "AR Browser" you envision; I actually worked on something like this a few years ago. That might be a big benefit, but we'll need some reasonably detailed use cases / vision to evaluate A) what it'll take for this proposal or others to support it and B) how much benefit it will bring.

Rendering bugs can potentially affect results. Libraries can often find workarounds for those and generally have a faster release cycle than browsers, and it's much harder to compensate for issues if rendering is done directly by the browser short of avoiding use of features entirely.

For some specifics, I've needed to reference this list of limitations in AR Quick Look regularly over the past three years, and another list by Sketchfab here. These are features that work correctly in WebGL-based engines like three.js, babylon.js, or model-viewer, but which fail in the 3D renderer provided by iOS. In some cases workarounds are possible, in others not — including support for compressed geometry and textures.

Unfortunately, I'm not aware that any of those limitations have been fixed since the list was published. There have been new features added to glTF and USD (e.g. improved PBR materials) in the time since then, which also are not available in iOS. I understand that no one here wants the <model> proposal to inherit or standardize those limitations, but as a developer that's an area of concern.

@marcoscaceres stated:

... They [WebXR & ] are comparable in complexity, wouldn't you say?

I am not sure I would. WebXR addresses the programmatic means of handling XR display environments. addresses declarative means for displaying multi-media, animated, interactive content. They are really fundamentally different.

If becomes a new element, then what happens when a new feature to glTF or whatever model formats are released? Does there need to be a model-codec that needs regular updating? What happens if the specified glTF file requests features that are not supported in <model>?

It's great to state that this is the incubation for a new element; however, early statements did not make that clear. I'm happy that there is more clarity now.

@elalish wrote:

I'm more making the point that it'll be costly. That's fine if there's adequate benefit.

Yes, that's what collectively we are trying to figure out - hence also seeking wide input from a range of implementers who will bear the cost. I'm also treating this as a cost-benefits analysis, and why a wide range of input is so critical here. Like @mrdoob alluded to, Apple gets the renderer "for free" across various platforms, but doesn't account for what's going to happen on Windows, Android, or Linux, etc. in other browser engines. And then there are the limitations/bugs of AR Quicklook that @donmccurdy pointed to.

Do you have some specifics to share on how <model> will make 3D easier for web devs?

If you mean numbers, no. But I think it will be the same specifics that brought about model-viewer (or that makes model-viewer easy to use).

As you know, to create a WebXR experience is quite challenging and a lot of code - like with canvas, that's totally ok if you are making a game, simulation, or anything beyond the basics. Even with model-viewer, it's still a few hundred kb for the library to have a 3D model show up (yes, I know 3D models can be orders of magnitude larger than the JS library, but still… I’m sure that will get better with time + formats). If a developer can just put a <model> in place, it takes all the hard work out of it while saving a few kb over the wire.

Now that users/developers increasingly have access to readily available tools like Apple's Object Capture, it takes a lot of the hard work out of creating 3D models. So, imagine a user is able to use a simple app to export a 3D object and just put it straight into their website. Preaching to the choir here, but coming to this as someone who spent years in 3D Studio Max, Lightwave, and Maya, this is phenomenal.

Likewise, can you share exactly what the privacy blockers are for WebXR, since it's not actually the camera feed?

I wouldn't say "privacy blockers". It's more that about wanting a solution that doesn't require any special permissions or JS to "just work". Like was said in the explainer, it’s the same as with motivations behind <img> <video>, <audio>, etc. They just work.

I'd also love to know more about the "AR Browser" you envision;

Safari, which is already an AR browser: if you visit Apple’s products pages, you can view our products in AR. However, you currently can’t see <model> directly in the browser, which is not amazing. Be cool to just have objects in the page, and you could seamlessly transition to AR.

But I’m also thinking of something like the Oculus browser or the browser you can load in Steam VR. I also really motivated by the stuff that Josh Carpenter wrote about in The Composable 3D Web.

I actually worked on something like this a few years ago. That might be a big benefit, but we'll need some reasonably detailed use cases / vision to evaluate A) what it'll take for this proposal or others to support it and B) how much benefit it will bring.

I can only speak to the Safari case (and the tooling I talked about above).

@DR-xR wrote:

They are really fundamentally different.

Yes, as they should be: we don't want to standardize a competing technology, but a complementary one. I'm instead asking about the time and complexity of the standardization effort (compared to WebXR), and @elalish is asking if it's worth the cost of that time and complexity.

If becomes a new element, then what happens when a new feature to glTF or whatever model formats are released? Does there need to be a model-codec that needs regular updating? What happens if the specified glTF file requests features that are not supported in ?

It's the same as with any other media format that browsers deal with, which also evolve over time: we deal with it either in the browser or in the format itself.

But I’m also thinking of something like the Oculus browser or the browser you can load in Steam VR.

The Quest browser team is very interested in this proposal because it allows us to extend our flat 2D panels with 3D objects without needing more permission from the users or the need to large Javascript libraries.

I also really motivated by the stuff that Josh Carpenter wrote about in The Composable 3D Web.

+1 to this. Josh Carpenter's slide decks are visionary.

immersive-web / model-element

Is it time for browsers to standardize 3D rendering? #55