w3c / miniapp-addressing

MiniApp Addressing
https://w3c.github.io/miniapp-addressing/
Other
5 stars 5 forks source link

使用 HTTPS URL 作为小程序 URI 的可行性讨论 | Discussion of the feasibility of using HTTPs as MiniApp URI scheme #2

Closed Sharonzd closed 2 years ago

Sharonzd commented 4 years ago

English translation follows / 后附英文版

背景

起源为 TAG review 的讨论:https://github.com/w3ctag/design-reviews/issues/478

TAG 建议尽可能的复用 HTTPS 协议,避免创造新的协议。

在之前的讨论中,我们提到了使用 miniapp:// 协议的目的是为了:

  1. 便于不通过网络请求,直接访问可能已经预置在 user agent 本地的小程序包。
  2. 通过 scheme 提前判断是小程序访问协议,提前触发小程序 runtime 环境,以保障小程序的打开性能。

对于目的1,可以通过缓存机制或者扩充现有的缓存机制来解决。

对于目的2,TAG 所建议的新增 content-type 并不能达到该目的。content-type 作为服务端的返回内容,需要服务端返回后才得知该资源是小程序,对于 user agent 而言,该时机已经为时过晚。

为什么快应用使用 HTTPS 作为小程序 URI

我们在 explainer 文档 中列举了目前各个小程序的 URI 现状,各家小程序均使用了各自独有的 scheme。但快应用同时支持使用 https scheme,URI 格式为 https://hapjs.org/app/<package>/[path][?key=value]

原因是一般的小程序宿主环境均为 APP。而快应用的宿主环境是操作系统。快应用的宿主环境在接受到以 http://hapjs.org/ 开头的协议时,便将该 URI 分派给快应用的运行时进行解析,而不会将 URI 分发到浏览器处理。所以该快应用 URI 相当于一个中转的资源标识符,而非传统意义上的直接用于与服务端交互的 HTTPS URL。

从资源定位的角度看和 PWA 的区别

不同于传统页面或者 PWA,它们的资源定位符通常对应着服务端响应的资源。比如 https://example.com/index.html 通常对应的服务端/缓存返回的 index.html 文件。而小程序是多页面的 mini APP,在因特网上是以一个整体包(比如 zip)的形式存储和传输,但是访问的时候会由小程序运行时环境解析到小程序内的具体子页面。

相较而言,小程序的资源定位方式可能更类似 widget URI 的形式。

如果我们使用 HTTP(S) URI 可能面临的问题

如果在浏览器中也使用 HTTP(S) URI 作为小程序访问协议的话,会存在以下问题:

  1. User agent 需要拦截所有 HTTP(S) 协议,并判断是否是小程序协议,从而带来 HTTP 请求的时间或性能消耗。
  2. URL 中需要有字段标识该 URL 为小程序 URL,而且小程序不会有中心服务器,包服务由各个第三方提供。因此在 URL 中无法像特定的域名标识 miniapp 类型。

使用 HTTPS 后可能的设计格式是:https://example.com/miniapp/<appid>/<miniapp-path>?key=value#title 其中,

如上,需要在 URL 中占用原本属于 HTTP 协议一些字段位置存放小程序的子路径和 query。不确定这种设计是否符合 HTTP URL 的设计原则。

问题总结

总结为两个问题:

  1. 如何保证宿主必须通过 URI 提前预知该 URI 为小程序 URI,同时尽可能避免性能损耗。
  2. 使用 HTTPS 协议格式存放小程序运行时的路径等信息是否符合 HTTPS URL 的设计原则。

感谢大家提供宝贵的意见与建议。


Background

Discussions originated in TAG review: https://github.com/w3ctag/design-reviews/issues/478

TAG recommended that we reuse the http(s) scheme as much as possible and avoid inventing a new scheme.

In previous discussions, we mentioned that the main reason to use the miniapp:// scheme is to:

  1. Facilitate direct access to MiniApp packages that may have been pre-install by the user agent locally without requesting through the network.
  2. Use this scheme to diagnose whether it is a MiniApp URI in advance and trigger the MiniApp runtime in advance to improve the performance of opening a MiniApp.

For goal 1, it can be solved through the browser caching mechanism or extending the existing caching mechanism.

For goal 2, a new content-type, suggested by TAG, may not achieve that purpose. The content-type is returned by the server. User agents only knows the resource is a MiniApp after the getting the response from the server, which is too late.

Why Quick Apps Use HTTPS as MiniApp URI

In the explainer, we enumerated the current solutions for the URIs in different MiniApps. Each vendor uses its own unique scheme. However, the Quick Apps also supports the https scheme, and its URI syntax is https://hapjs.org/app/<package>/[path][?key=value].

The reason for that is the host environment of most other MiniApps are native apps, while the host environment of Quick Apps is the operating system. When the host environment of Quick Apps receives an URI which starts with http://hapjs.org/, it will assign the URI to the Quick App's runtime as a resolution, so the browser won't need to process it. Therefore, the Quick App URI is equivalent to the transfer of resource identifier, instead of a traditional HTTPS URL directly used to interact with the server.

Difference from PWA from the perspective of resource identifying

Unlike traditional web pages or PWAs, whose resource identifier usually correspond to resources the server responds to. For example, https://example.com/index.html is usually a index.html file returned by the server or cache. MiniApp is a multi-page (mini) APP. It is stored and transmitted in the form of a whole package (such as zip) on the Internet, and when being accessed, it will be parsed by the runtime to a specific page in the MiniApp.

In comparison, the MiniApp resource identifying is actually more similar to the widget URI.

Possible issues if we use HTTP(S) as the MiniApp URI scheme

  1. The user agent needs to intercept all HTTP(S) URLs and determine whether it is a MiniApp URI, which brings performance cost.
  2. There needs to be a field in the URL to identify the URL as an MiniApp URL, and there is currently no (and may never be) central package management server among the vendors. Package services are provided by various third parties. Therefore, the miniapp type cannot be identified in the URL with a specific domain.

A possible design syntax using HTTPS is: https://example.com/miniapp/<appid>/<miniapp-path>?key=value#title

As above, I am not sure if it meets the design principles to occupy the fields for the resource identifier information of the MiniApp that are originally part of the HTTPs URL for identify the resource (maybe zip).

Summary of the issues

  1. How to ensure that the user agent can recognise in advance that the URI is a MiniApp URI, while lowering performance cost as much as possible?
  2. Whether using fields to store information such as the path of the MiniApp is following the design principles of HTTPS URLs?

Thank you for your valuable comments and suggestions.

siusin commented 4 years ago

@marcoscaceres we were wondering why the editors decided to create a new URI Scheme for Widget back to 2012, do you remember the thinking at that time?

marcoscaceres commented 4 years ago

The web depends on things running on an origin - the URI scheme gave us a way of running widgets in a way that was compatible with HTTP and other web technologies that would otherwise break if run on file://.

marcoscaceres commented 4 years ago

I'd also suggest doing what the TAG suggests: don't invent a new URI scheme, don't invent new URI semantics. If this works wants to distribute things signed by an origin, then might be worth looking at signed exchanges or look at just using Web Manifest and a service worker.

The bottom line is this: those "packages" need to be downloaded regardless, so you might as well just download them from the Web, using a proper URL (HTTPS)... and they need to work offline, they might as well use Service Workers.

That sidesteps the whole discussion around URI, packaging, etc. Consider how this could just be solved using Web Manifest instead... like, it could be a special "display mode": "miniapp" or whatever. Then the WG doesn't need to waste time reinventing the widget wheel, and stuff just works with existing technology.

ylafon commented 4 years ago

"How to ensure that the user agent can recognise in advance that the URI is a MiniApp URI, while lowering performance cost as much as possible?" One possibility would be to use a \ tag, with the type of the package in the type parameter. It might even be possible to define a 'mini app' rel type if needed.

ylafon commented 4 years ago

I see that @marcoscaceres already answered about technologies that are being defined (and may benefit for specific uses cases you may have). It all started with the question around being able to identify resources in a package that led to https://www.w3.org/TR/2018/NOTE-web-packaging-20180130/ and continued in WICG after new use cases were added. So yes, there are bricks in the work.

Sharonzd commented 4 years ago

Thank you for your suggestions. We will continue to study the above technologies carefully. If appropriate, we will refer to and join the discussions on the technologies of manifest, service worker, signed exchanges and web packaging.

The <link> tag can be used to access resources in HTML, but it is not suitable for instructional access (such as location.href = xxx, or other instruction in the operating system), so it may not completely meet the requirements of predicting MiniApp.

And the information for the MiniApp (page path, query, etc) must be stored in the URI, not the manifest. Because that also represents specific MiniApp page that the person using the URI wants to visit.

We will give more flowcharts to explain the current MiniApp URI dereferencing process.

marcoscaceres commented 4 years ago

And the information for the MiniApp (page path, query, etc) must be stored in the URI, not the manifest.

Why? This seems very arbitrary.

Because that also represents specific MiniApp page that the person using the URI wants to visit.

Why would it be any different from any regular web application?

Sharonzd commented 4 years ago

Here is the flowchart: The flow A is closer to the current solution in practice. The flow B uses https, but it seems to be a little redundant, and more violates the semantics of http url. The flow C is maybe the ideal design, but we have no centralized service which afford all packages of any user agent. What's more, it restricts us to only get package via https/http.

MiniApp URI flowchart

zhangyongjing commented 4 years ago

I think the problem lies in the fact that a MiniApp doesn't always assume the existence of an origin (the app content and services are not necessarily web-based), and it may NOT be delivered/launched only via web access (but rather through app markets and other alternative means). Therefore, 'http' is not a safe/efficient baseline scheme for the case of MiniApp.

Sharonzd commented 4 years ago

And the information for the MiniApp (page path, query, etc) must be stored in the URI, not the manifest.

Why? This seems very arbitrary.

The meaning of "the information" in this sentence is the the resource‘s locator which the user want to access. When no page path is specified in the URI, the user agent usually opens the home page (or index page, or other default page) of the miniApp. when there is a page path specified in the URI, the user agent will opens the specified page of the miniApp. And when query is specified in the URI, MiniApp runtime will pass it to the page.

Of course, there are a lot of information in manifest, but those are the package’s information, such as which pages it has and which page is its home page.

marcoscaceres commented 4 years ago

@Sharonzd, thanks for the flowchart:

The flow A is closer to the current solution in practice.

Right, miniapp:// is basically the same as widget URI scheme and the app URI scheme. They are all not great because they need to fake HTTP semantics and break things in various ways.

The flow B uses https, but it seems to be a little redundant, and more violates the semantics of http url.

Correct. This is also not great because it's creating a special case of HTTP URLs. This is likely to be fragile.

The flow C is maybe the ideal design, but we have no centralized service which afford all packages of any user agent. What's more, it restricts us to only get package via https/http.

Right, but it doesn't need to be centralized. If all you are trying to achieve with this effort is to say "this app can be run in some mini-mode", then maybe something like Web Manifest is a more suitable solution... you can just install them, and then then the app just shows up as mini-app (and you don't need any packaging, new URI scheme, etc).

@zhangyongjing wrote:

I think the problem lies in the fact that a MiniApp doesn't always assume the existence of an origin (the app content and services are not necessarily web-based), and it may NOT be delivered/launched only via web access (but rather through app markets and other alternative means). Therefore, 'http' is not a safe/efficient baseline scheme for the case of MiniApp.

Ok, but all web features and APIs assume an origin to work - and require it an origin to make security determinations. Without an origin, things simply won't work or will work in weird and unexpected ways (e.g., how they work with "file://" is totally random and mostly undefined).

@Sharonzd wrote:

The meaning of "the information" in this sentence is the the resource‘s locator which the user want to access. When no page path is specified in the URI, the user agent usually opens the home page (or index page, or other default page) of the miniApp. when there is a page path specified in the URI, the user agent will opens the specified page of the miniApp. And when query is specified in the URI, MiniApp runtime will pass it to the page. Of course, there are a lot of information in manifest, but those are the package’s information, such as which pages it has and which page is its home page.

Again, this sounds like it overlaps with Web Manifest... in particular, the "start_url" and "scope" member.

cynthia commented 4 years ago

We've discussed this in our TAG meeting today and agree with the feedback from @marcoscaceres noted above. It's also worth noting that the very person who edited those (Widget/API URI) specs is on record suggesting this is not a great idea.

Sharonzd commented 4 years ago

Thank you for your responses and recommends

Right, but it doesn't need to be centralized

Why doesn't it need to be centralized? If it is not centralized, the miniapp package will be on the different servers of different manufacturers, and we have to use the flow B to identify both miniapp and the package server :( Or shall we use DID? But DID may be too advanced to implement in MiniApp

If all you are trying to achieve with this effort is to say "this app can be run in some mini-mode", then maybe something like Web Manifest is a more suitable solution...

We not only want to say this app is a miniapp, but more importantly, we want to say that in advance. For preparing runtimes such as js engine(logic layer) and webview(view layer), that's a very important reason why miniapp is so fast.
So it's too late if just say "mini-mode" in Web Manifest. Of course, content-type and "mini-mode" are also needed.

Ok, but all web features and APIs assume an origin to work - and require it an origin to make security determinations. Without an origin, things simply won't work or will work in weird and unexpected ways (e.g., how they work with "file://" is totally random and mostly undefined).

Yes, I agree that miniapp requires an origin. For the case without hostname in the MiniApp URI proposal (such as miniapp://foo), our original intention is actually to give the user agent more flexibility, and it doesn't mean that there is no origin. In this case, the way to get the miniapp package is determined by the default action of a user agent, which can be a default fetch address (for example, https://some-user-agent.com?id=123), or it can be local file address during local debugging. Of course, local debugging should not be part of this URI proposal.

zhangyongjing commented 4 years ago

Ok, but all web features and APIs assume an origin to work - and require it an origin to make security determinations. Without an origin, things simply won't work or will work in weird and unexpected ways (e.g., how they work with "file://" is totally random and mostly undefined).

Yes, I agree that miniapp requires an origin. For the case without hostname in the MiniApp URI proposal (such as miniapp://foo), our original intention is actually to give the user agent more flexibility, and it doesn't mean that there is no origin. In this case, the way to get the miniapp package is determined by the default action of a user agent, which can be a default fetch address (for example, https://some-user-agent.com?id=123), or it can be local file address during local debugging. Of course, local debugging should not be part of this URI proposal.

I would disagree with the assumption of the existence of an origin for a MiniApp. It may have an origin in the web-based scenarios, but may also work as a local app (which could connect to some remote services but not necessary web-based).

marcoscaceres commented 4 years ago

I would disagree with the assumption of the existence of an origin for a MiniApp. It may have an origin in the web-based scenarios, but may also work as a local app (which could connect to some remote services but not necessary web-based).

The problems is that that won't work in practice. Please understand that web platform's primitives assume http, https, etc. in order for "fetch" (GET/POST/etc HTTP) data, perform CORS checks, etc. If you start changing or assuming those things are there, all web security breaks down.

ylafon commented 4 years ago

Relevant to the discussion: https://www.w3.org/wiki/UriSchemes and specifically the Why shouldn't I create a new scheme for XYZ? section.

xfq commented 4 years ago

I think the goal of the CG is not to create a new URI scheme, but to have a way to create deep links to MiniApp contents in a performant (i.e., recognizing the URI in advance to reduce network request overhead and to preload the MiniApp) and cross-platform (works in HTML, JavaScript, MiniApps from different vendors, and native apps in different operating systems) way.

Some of the existing solutions (manifest & service worker, HTML link element, intent filter, associated domains) don’t seem to solve the problem above.

I wonder if we can create a MIME type for MiniApps, and use HEAD to reduce the performance overhead of a usual HTTP request. @ylafon Do you have any specific suggestions on how this might work?

ylafon commented 4 years ago

First, where are links to those URLs supposed to be present, if it is in an HTML document, <a> can have either a type to give a hint of what will be the media type, or a link relation that could infer it is part of a mini-app, that way, you will hint in advance that the mini-app runtime should be used. Intent filter seems to be a way to implement a shared service worker at the OS level, but I didn't check close enough if it was entirely possible.

xfq commented 4 years ago

Yes, a with type/rel looks like a possible solution in HTML, but we also need to look at other scenarios.

ylafon commented 4 years ago

So, for the first use case, which is early identification of a mini-app package:

First, I will quote https://www.w3.org/DesignIssues/Axioms.html#opaque

Axiom: Opacity of URIs

The only thing you can use an identifier for is to refer to an object. When you are not dereferencing, you should not look at the contents of the URI string to gain other information.

The usual way of doing dispatching on the Web is through the use of media types. This happens after processing the URL, but before receiving the payload, in the case of mini-app, the processing of the received package can start only after the payload is fully received (see packaging issues). It is not clear that the metadata contained in the HTTP response (prior to the body) can be used efficiently in their design. But the time to get metadata is way shorter than the time to get the actual content and process it.

The dereferencing process is like this: URL -> URL processing (in the general case DNS resolution, breaking down port/path, see also deep-linking below) -> HTTP request -> HTTP response (metadata includes media-type, size, etc...) => A specific media type will trigger dispatching to the relevant app. -> HTTP body response -> end of processing. When media-type is used, the dispatching to a mini-app runtime of a package with a mini-app specific media-type may happen even before the entire package is received.

Links to a mini-app or files contained in a mini-app could be labelled to indicate preemptively (hint) that it should be a mini-app, or sites can have a .well-known/ entry to list what subtrees belong to a particular mini-app

ylafon commented 4 years ago

For the second use case, which is addressing the content of an installed mini-app package:

For the second use case, there are two distinct issues, deep-linking within a web-app, even if local links would work better, or between different mini-apps. In the case of regular WebApps, it is a non issue as they are resolving as regular http calls, with optional cache through ServiceWorkers for limited offline interactions. For apps in general linking to another app is related to the dispatching options.

In the case of native apps, the default for many years was to define per-app URL scheme, but it is being deprecated by the introdution of Universal Links in 2015. Universal links are a way for an app to register a subtree or http URLs that are handled by a specific application. To do so, you need to be able to trust that those URLs are indeed served by the owner of one app, this is done via the app-store where you have to register those URLs, and a post-validation using .well-known URLs, roughly the same kind of mechanism used by some SSL providers using ACME. The universal links are system-wide and are resolved by system libraries, allowing proper deep-linking even when an app is not installed.

In Android, Intent Filters are a way to do filtering based on different characteristics, URLs can be one of them. They are app-specific and so not directly suited to do the same as Universal Links, but App Links is a way to register links that pertains to a specific app and allow Deep Linking the same way as Universal Links. Similar validation mechanism is used in the case of App Links. Note that some companies are providing services to automate this verification step.

Also, the use of per-app URL schemes leads to having broken links when apps are not installed, while with the use of OS-level dereferencing techniques like above, if the app is not installed, it is a regular https call. Note that the use of HTTP Link header could also inform that the content retrieved was from a package, and give the incentive to install that said package.

xfq commented 3 years ago

See also https://github.com/WICG/pwa-url-handler/blob/master/explainer.md