metadata in head to enhance webxr indexed content

HyroVitalyProtago commented 2 years ago

Hello everyone, I suggest a new metadata element to enable a better indexation in web browsers for webxr content. Maybe something like <meta name="webxr" content="vr ar"> with vr and/or ar in content. (from https://github.com/immersive-web/webxr/issues/1252, toji redirected me here)

brianpeiris commented 2 years ago

I'm glad there's interest in this from other people as well. I was toying with the idea while building wwxr.io, which is meant to be a search engine for XR content on the web. (It is still a work in progress). I wrote some thoughts about crawling for XR content on the web here: https://archive.is/xPQUh (Archive link, since wwxr.io has a habit of going down occasionally at the moment).

Another option might be to follow the spec for the Open Graph protocol, specifically the extensible CURIE Object Type part of the spec. The benefit of using Open Graph, in my mind is that it is already wide-spread on the Web (for example, see Twitter's use), and libraries already exist for finding and parsing that format of meta tag. If we went with this option, ideally the Immersive Web group would endorse, claim and own the namespace (assuming we can come to some consensus in the community). It might look something like this, with sub-types for VR, AR and XR (if an app supports both modes)

<head prefix="webxr: https://www.w3.org/immersive-web/ns#">
<meta property="og:type" content="webxr:vr" />
<meta property="og:type" content="webxr:ar" />
<meta property="og:type" content="webxr:xr" />

When I put this out to the community on Twitter, I also got some other good ideas:

Why not use schema.org instead? The benefits being that it is an active spec, more formalized, already part of the W3C, and is able to accommodate complex scenarios (like a single page that contains multiple WebXR experiences). If we went with this option, I think we'd still have to reach consensus on how exactly we'd apply it to WebXR content and how much of the spec we would use, since it can be quite complex.
What about Google's Structured Data? Which is currently one way Google indexes and displays 3D models. https://developers.google.com/search/docs/advanced/structured-data/intro-structured-data, https://samuelschmitt.com/google-serp-3d-augmented-reality/. IMO, this may not necessarily help with general interactive WebXR content, but it may be a good example to follow, especially since it uses JSON-LD, which is another W3C spec.

Personally, for the purposes of finding WebXR content on the wider web, I'd prefer to take as minimal of an approach as possible. I think a single meta tag would be sufficient, whether that it is in the Open Graph format, the schema.org format, or some custom format we decide on. In addition to identifying the content, it would also be ideal to use metadata containing a description of the content, and perhaps a thumbnail of it as well. This information ought to follow existing practices, but it would be great if we could actively encourage the community to adopt those practices, since WebXR content often does not include descriptive metadata.

Other meta-level questions we may need to answer:

Is this discussion even in the scope of the Immersive Web group?
Does it fall under the Working Group, or the Community Group?
Would the Immersive Web group be willing to make an official endorsement for a meta data format, if the community agrees?
How we actually reach enough of the community to form a consensus?

HyroVitalyProtago commented 2 years ago

Thanks a lot for all the infos!

I'm completly in favor of your Open Graph protocol proposition, that's minimal and straighforward. I'm also in favor of more advanced meta tags, like a 360° view thumbnail, that can be use later for 3d-hyperlinks (portals) between WebXR sites.

schema.org is probably a good way to enhance interoperability between WebXR sites, so I would say that it can be used in addition to the meta tag to describe more precisely the content.
Personally, I'm against the Google's Strutured Data (GSD). Even if I like JSON-LD format, this approach just duplicate the content of the page. I prefer schema.org as it add metadata to the already there content.
But, this kind of reflexion is based on a 3D web code that "look" like current 2D web code, as A-Frame. If someone build something with WebGL/Three.js/... tech where all the content is only in javascript or even compiled with Wasm, schema.org is not really usable... or like GSD, there is somewhere a duplication of the content with the description.

For the meta-level questions, they are completely relevant but I have no idea of the answers.

cabanier commented 2 years ago

@HyroVitalyProtago What would be the usage of this metadata? Is it strictly for indexing or do you have other cases in mind?

HyroVitalyProtago commented 2 years ago

This metadata would be primarily for indexing, even if I think that later on, other metadata could be useful for interoperability (sadly we're not there yet).

Wellan-Arthur commented 2 years ago

Another usage for this metadata that I see is accessibility : informing screen readers that the main content might not even be text, but 3D immersive graphics that require specific tooling (ML, CV...) to get understood or parsed.

brianpeiris commented 1 year ago

For anyone interested, I've open-sourced my experimental search engine, based on Common Crawl data. Though I don't plan to continue working on it, so this is more for educational purposes: https://github.com/brianpeiris/wwxr

cabanier commented 1 year ago

@brianpeiris thanks! it would indeed be nice if there was a way to query for immersive content. @toji , do you know of any efforts to make page report their content type? (ie media)

Wellan-Arthur commented 1 year ago

Hi, what about exposing the scene in a stringified and verbose form factor that would be included in the HTML ? Every engine would have to generate it either beforehand and on the fly when asked for it. It would allow screen readers and web crawlers to better understand what there is to see.

immersive-web / proposals

metadata in head to enhance webxr indexed content #73