mgiuca commented 7 years ago

There isn't really a formal specification of the manifest data structure. Instead, each member has its own section which describes the format of that member in prose. This makes it hard to discuss (e.g., in Ken's shortcut proposal, we'd like to be able to say "sequence<IconInfo> icons;" but there's no name for "the dictionary that describes a Manifest icon" because it's described in prose).

I think we should add a WebIDL section that contains a dictionary definition for the full manifest, as a first step (I can do it).

As a second step, I think this would allow us to simplify the language in each individual member. The "steps for processing a manifest" algorithm could be replaced with "Parse the JSON into an IDL dictionary value, then convert it into an ECMAScript object" (which automatically applies all of the IDL type conversion rules).

Then, for example, this text:

The steps for processing the short_name member is given by the following algorithm. The algorithm takes a manifest as an argument. This algorithm returns a string or undefined.

Let value be the result of calling the [[GetOwnProperty]] internal method of manifest with argument "short_name".

If Type(value) is not "string":

If Type(value) is not "undefined", issue a developer warning that the type is not supported.

Return undefined.

Otherwise, Trim(value) and return the result.

could be replaced by this (assuming short_name is a DOMString):

The steps for processing the short_name member is given by the following algorithm. The algorithm takes a manifest [referring to the processed ECMAScript object] as an argument. This algorithm returns a string or undefined.

Let value be the result of calling the [[GetOwnProperty]] internal method of manifest with argument "short_name".

If Type(value) is not "string", return value.

Otherwise, Trim(value) and return the result.

Were it not for the call to Trim, this would make almost all the processing steps trivial. Unfortunately, almost all of them do Trim their strings, which makes things more complicated. Perhaps we could make a global rule that Manifest strings are trimmed, then we wouldn't really need any processing steps at all.

g-ortuno commented 5 years ago

FWIW, Chrome already has an implementation of JSON Schema. We currently use it to parse Policies but in the past we used it to parse Extension Manifests haha.

mgiuca commented 5 years ago

I really don't want to go down the route of either a) inventing a new schema language, b) forking a subset of WebIDL or another schema language, or c) using a schema language that isn't already part of the web platform. All three of those seem like a tremendous amount of work compared to just tweaking WebIDL for our needs.

FYI currently Chromium parses the manifest with a hand-written algorithm which has all of the same issues as the old Manifest spec (very hard to read and easy to make mistakes). I would be hopeful that we can modify Chromium's manifest parser to use WebIDL just like we did in the spec, but this needs these fixes first.

OK, here are all the features of WebIDL that we use in the Manifest spec, or Web Share Target (which is an extension of Manifest):

Simple types DOMString, USVString, boolean.
enum
sequence
dictionary
partial dictionary (for easily extending in other specs)
Optional and required dictionary members.
Default values for dictionary members.
"or" type clauses.
Extended attributes / annotated types (to declaratively specify the extent of invalidation caused by type errors).
Conversion from an ECMAScript value to an IDL value, of all of the above types. Includes a lot of type checking logic.

That's a huge chunk of WebIDL.

There's also quite a lot of functionality in WebIDL that we don't need to define the Web App Manifest. A lot of the above discussion focuses on the fact that we aren't using the entire WebIDL spec, just the type conversion rules. I don't see this as any sort of argument to not use WebIDL. Specs depend on other specs all the time and use only a tiny fraction of the dependent spec's functionality. Pick any given web API and you'll probably find it uses a small subset of the WebIDL features. I don't see that as a reason for each web API to invent its own specification language with just the features it needs.

@domenic :

As an example, on a page where someone runs Object.prototype.short_name = "foo", using Web IDL would imply that parsing {} gives back a short name of "foo", not undefined. Again, you could add yet another extended attribute to fix that, but...

Yeah, as I said above, I think adding a [StrictString] extended type attribute would not be any different to the existing [EnforceRange] attribute for integers. TAG advice is to always use [EnforceRange] for integers, so why not add [StrictString] and recommend that all new interfaces use that? A desire to not convert {} to "[object Object]" is not unique to the manifest JSON file, and seems orthogonal to the suitability of WebIDL as a format for validating JSON.

Something I want to clear up with what I'm trying to do here:

@annevk :

A type-mismatch is equal to the field being set to undefined.

No, not always. At the top level, we want each member to only invalidate itself, but at lower levels, we have the flexibility to decide (note in #750 I don't add [CatchTypeErrors] to every member in every dictionary, just the top-level members).

I want to be able to annotate dictionary members and sequence elements on a case-by-case basis, so that we have the flexibility of deciding the granularity of errors (when a type error occurs, what level of the manifest does it invalidate?).

For example, in Web Share Target, we are currently debating whether an error in a files.accept element should invalidate a) just that accept type, b) the entire accept array, c) the entire file, d) the entire files array, or e) the entire share target.

With [CatchTypeError], we can easily (and readably) specify one of those five options by placing the [CatchTypeError] annotation on a) the USVString inside the accept sequence, b) the accept member of ShareTargetFiles, c) the ShareTargetFile inside the files sequence, d) the files member of ShareTargetParams, or e) just the share_target member of WebAppManifest.

So I'm not trying to universally change how type checking works in WebIDL. I'm trying to add a feature to WebIDL to let us override type checking behaviour as desired.

I think maybe @domenic you and I could have a VC offline to sync on this instead of a long public thread, then we can summarize here.

annevk commented 5 years ago

I think we agree on reusing IDL types and perhaps on using most of the dictionary syntax. The question with respect to the syntax is whether it's best to abstract that somehow or fork it. Where we might disagree is conversion.

I think you're equating JSON with APIs a bit too much. [StrictString] makes no sense for APIs (using ToString on arguments is expected and used throughout the ecosystem), but makes perfect sense for a JSON resource. Trying to shoehorn those different validation expectation into the same declarative system is not a good idea as they might leak. A shared abstraction might be warranted though.

mgiuca commented 5 years ago

OK. I can agree in principle to factoring out the type definitions and ECMAScript conversion logic modulo strictness requirements into a shared abstraction. It sounds like a huge amount of work though.

This is a fairly urgent issue for Manifest since right now, we are not properly specifying how errors are checked. What I might try and do in the mean time is add some hand-wavy text to the processing steps like "During IDL conversion, if the conversion of any top-level member of the WebAppManifest throws a TypeError, set that member to its default value." (And so on.)

This isn't terribly formal, but at least it will unambiguously describe the expected error handling.

annevk commented 5 years ago

The thing is that if something expects a string and you give it a number or an array, IDL conversion wouldn't throw, whereas I think that's what we want for JSON conversion. For JSON I'd imagine we don't want much conversion at all so that's why I thought specifying that bit again wouldn't be too much effort.

The other problem with using IDL directly I think is that these dictionaries shouldn't be in the same namespace.

marcoscaceres commented 5 years ago

I'm still unsure what the problem is with just having a single property extraction algorithm as per https://github.com/mozilla/gecko-dev/blob/master/dom/manifest/ValueExtractor.jsm#L34? And the example I gave. They object definitions are concise and get the job done in the same way that that defining an extended attribute would.

As @annevk mentioned, if we add things like [StrictString] and [CatchTypeError], they are going to leak into everyday WebIDL interfaces. I really don't think we want a special set of WebIDL things for a JSON format VS having a thing just Web APIs.

annevk commented 5 years ago

@marcoscaceres how would you envision that working? Basically an algorithm, that given a name and a type, returns the value? It seems you'd still need to describe the general JSON format, but that might work if flushed out a bit more.

marcoscaceres commented 5 years ago

Basically what we originally had, just tightened up a bit more that we understand what we want for each member type (and how to get values from each property).

So:

Use just the JSON types - not the Web IDL types. And just hook directly into ES6 and skip WebIDL entirely.
define a ExtractValue(object, property, expectedType, [trim, normalization]) abstract operation in terms of ES6. It throws TypeErrors that the caller then deals with. Optionally, normalization can be "lcase" or whatever - or we can do a breaking change and stop normalizing stuff.
- We can even do specialized extractors: ExtractStringValue() if we want even more consistency with type checks and coercion (e.g., ExtractColorValue()).

annevk commented 5 years ago

I like that approach as a low-level building block. It shouldn't be as much work as IDL given that the space is much more constrained (i.e., we only have object, array, number, string, boolean, or null, and we can trust the input to not cause side effects). And you basically want the same kind of algorithm for each (as @marcoscaceres's code also demonstrates):

Let value be the result of getting property from object.
If value is undefined, then return undefined.
If value's type is not expectedType, then return undefined.
Return value.

And then you can define higher-level helpers around that, which perform whitespace normalization, range checks, lone surrogate replacement, array to list concept mapping, etc.

mgiuca commented 5 years ago

I added a new attempt to fix this in #753 (This is the "low tech" approach of just adding some fairly non-formal language around behaving differently in manifest parsing than the normal WebIDL spec allows).

I'm less in favour of this than #750 (which would require upstream changes to WebIDL). Because if we did heycam/webidl#597 + #750, then we implementations could easily follow along with these changes and the manifest parser implementations could literally use the IDL parser. With #753, I think it specifies what we want, but it's impractical to implement it without changing the implementation's WebIDL parser.

(Of course, there are other discussions about reverting the use of WebIDL altogether, but in my opinion, I would rather keep it.)

marcoscaceres commented 5 years ago

753 still does the IDL conversion tho, which still leaves us in a strange place. I'd prefer we just go back to defining this in terms of pure ECMAScript and not do any IDL at all.

marcoscaceres commented 5 years ago

I'd like to pick this up again and rewrite the spec in terms of ECMAScript.

@mgiuca, would you be ok with that?

marcoscaceres commented 5 years ago

Ok, we now have a means to convert JSON into infra types: https://github.com/whatwg/infra/issues/159#issuecomment-491338205

My plan is to rewrite the spec to use the above and operate on the infra types.

kenchris commented 5 years ago

Can we still get something like the IDL overview at the end?

marcoscaceres commented 5 years ago

No, but we can just improve the TOC.

marcoscaceres commented 5 years ago

(alternative, we have a non-normative JSON Schema we can link to... I think we already have that)

dominickng commented 5 years ago

It looks like there are more use cases which want an ImageResource IDL dictionary (e.g. Content Indexing, see https://github.com/rayankans/content-index/issues/2, and Background Fetch, see https://wicg.github.io/background-fetch/#background-fetch-manager). Maybe that should be spun out of Manifest, which then reduces the need to define Manifest itself in terms of IDL?

There's also discussion on the image selection front in https://github.com/w3c/manifest/pull/710 (@jakearchibald). Is there a way to move forward here @marcoscaceres @mgiuca?

@rayankans FYI

mgiuca commented 5 years ago

@marcoscaceres Do you think you'll proceed with converting this to a JSON schema type thing (and not IDL)? We have clients trying to use our ImageResource dict which means now is a good inflection point to decide either a) it's OK to use dicts from Manifest in other parts of the web platform, or b) no, these are specific to the Manifest format and can't be used in JS APIs.

I still think that being able to use them in other JS APIs is a good thing (reuse of those definitions, instead of having to define parallel dicts in two different languages), but if you want to change over to another language, we should do so before others start depending on it.

(I'm also hesitant about this specific usage, in Background Fetch, since the ImageResources used there are unrelated to the ones we use in Manifest.)

marcoscaceres commented 5 years ago

@marcoscaceres Do you think you'll proceed with converting this to a JSON schema type thing (and not IDL)?

Yes

We have clients trying to use our ImageResource dict which means now is a good inflection point to decide either a) it's OK to use dicts from Manifest in other parts of the web platform, or b) no, these are specific to the Manifest format and can't be used in JS APIs.

We should keep ImageResource as a thing for APIs tho (we've tried to do this previously also with @jakearchibald). But we can pull that into a new spec if needed.

marcoscaceres commented 4 years ago

Ok, first step: I've sent a PR to include the JSON schema into the spec https://github.com/w3c/manifest/pull/928

marcoscaceres commented 3 years ago

In 32b497c4ca34c9069aa5e545126c252724f370fe, I (finally) got rid of the IDL and converted the "processed manifest" to an (Infra) ordered map.

w3c / manifest

Describe the manifest data structure #611

753 still does the IDL conversion tho, which still leaves us in a strange place. I'd prefer we just go back to defining this in terms of pure ECMAScript and not do any IDL at all.