Closed mgiuca closed 3 years ago
FWIW, Chrome already has an implementation of JSON Schema. We currently use it to parse Policies but in the past we used it to parse Extension Manifests haha.
I really don't want to go down the route of either a) inventing a new schema language, b) forking a subset of WebIDL or another schema language, or c) using a schema language that isn't already part of the web platform. All three of those seem like a tremendous amount of work compared to just tweaking WebIDL for our needs.
FYI currently Chromium parses the manifest with a hand-written algorithm which has all of the same issues as the old Manifest spec (very hard to read and easy to make mistakes). I would be hopeful that we can modify Chromium's manifest parser to use WebIDL just like we did in the spec, but this needs these fixes first.
OK, here are all the features of WebIDL that we use in the Manifest spec, or Web Share Target (which is an extension of Manifest):
DOMString
, USVString
, boolean
.enum
sequence
dictionary
partial dictionary
(for easily extending in other specs)or
" type clauses.That's a huge chunk of WebIDL.
There's also quite a lot of functionality in WebIDL that we don't need to define the Web App Manifest. A lot of the above discussion focuses on the fact that we aren't using the entire WebIDL spec, just the type conversion rules. I don't see this as any sort of argument to not use WebIDL. Specs depend on other specs all the time and use only a tiny fraction of the dependent spec's functionality. Pick any given web API and you'll probably find it uses a small subset of the WebIDL features. I don't see that as a reason for each web API to invent its own specification language with just the features it needs.
@domenic :
As an example, on a page where someone runs Object.prototype.short_name = "foo", using Web IDL would imply that parsing {} gives back a short name of "foo", not undefined. Again, you could add yet another extended attribute to fix that, but...
Yeah, as I said above, I think adding a [StrictString]
extended type attribute would not be any different to the existing [EnforceRange]
attribute for integers. TAG advice is to always use [EnforceRange]
for integers, so why not add [StrictString]
and recommend that all new interfaces use that? A desire to not convert {} to "[object Object]" is not unique to the manifest JSON file, and seems orthogonal to the suitability of WebIDL as a format for validating JSON.
Something I want to clear up with what I'm trying to do here:
@annevk :
- A type-mismatch is equal to the field being set to undefined.
No, not always. At the top level, we want each member to only invalidate itself, but at lower levels, we have the flexibility to decide (note in #750 I don't add [CatchTypeErrors]
to every member in every dictionary, just the top-level members).
I want to be able to annotate dictionary members and sequence elements on a case-by-case basis, so that we have the flexibility of deciding the granularity of errors (when a type error occurs, what level of the manifest does it invalidate?).
For example, in Web Share Target, we are currently debating whether an error in a files.accept
element should invalidate a) just that accept
type, b) the entire accept
array, c) the entire file
, d) the entire files
array, or e) the entire share target.
With [CatchTypeError]
, we can easily (and readably) specify one of those five options by placing the [CatchTypeError]
annotation on a) the USVString
inside the accept
sequence, b) the accept
member of ShareTargetFiles
, c) the ShareTargetFile
inside the files
sequence, d) the files
member of ShareTargetParams
, or e) just the share_target
member of WebAppManifest
.
So I'm not trying to universally change how type checking works in WebIDL. I'm trying to add a feature to WebIDL to let us override type checking behaviour as desired.
I think maybe @domenic you and I could have a VC offline to sync on this instead of a long public thread, then we can summarize here.
I think we agree on reusing IDL types and perhaps on using most of the dictionary syntax. The question with respect to the syntax is whether it's best to abstract that somehow or fork it. Where we might disagree is conversion.
I think you're equating JSON with APIs a bit too much. [StrictString] makes no sense for APIs (using ToString on arguments is expected and used throughout the ecosystem), but makes perfect sense for a JSON resource. Trying to shoehorn those different validation expectation into the same declarative system is not a good idea as they might leak. A shared abstraction might be warranted though.
OK. I can agree in principle to factoring out the type definitions and ECMAScript conversion logic modulo strictness requirements into a shared abstraction. It sounds like a huge amount of work though.
This is a fairly urgent issue for Manifest since right now, we are not properly specifying how errors are checked. What I might try and do in the mean time is add some hand-wavy text to the processing steps like "During IDL conversion, if the conversion of any top-level member of the WebAppManifest
throws a TypeError
, set that member to its default value." (And so on.)
This isn't terribly formal, but at least it will unambiguously describe the expected error handling.
The thing is that if something expects a string and you give it a number or an array, IDL conversion wouldn't throw, whereas I think that's what we want for JSON conversion. For JSON I'd imagine we don't want much conversion at all so that's why I thought specifying that bit again wouldn't be too much effort.
The other problem with using IDL directly I think is that these dictionaries shouldn't be in the same namespace.
I'm still unsure what the problem is with just having a single property extraction algorithm as per https://github.com/mozilla/gecko-dev/blob/master/dom/manifest/ValueExtractor.jsm#L34? And the example I gave. They object definitions are concise and get the job done in the same way that that defining an extended attribute would.
As @annevk mentioned, if we add things like [StrictString] and [CatchTypeError], they are going to leak into everyday WebIDL interfaces. I really don't think we want a special set of WebIDL things for a JSON format VS having a thing just Web APIs.
@marcoscaceres how would you envision that working? Basically an algorithm, that given a name and a type, returns the value? It seems you'd still need to describe the general JSON format, but that might work if flushed out a bit more.
Basically what we originally had, just tightened up a bit more that we understand what we want for each member type (and how to get values from each property).
So:
ExtractValue(object, property, expectedType, [trim, normalization])
abstract operation in terms of ES6. It throws TypeErrors that the caller then deals with. Optionally, normalization can be "lcase" or whatever - or we can do a breaking change and stop normalizing stuff.
ExtractStringValue()
if we want even more consistency with type checks and coercion (e.g., ExtractColorValue()). I like that approach as a low-level building block. It shouldn't be as much work as IDL given that the space is much more constrained (i.e., we only have object, array, number, string, boolean, or null, and we can trust the input to not cause side effects). And you basically want the same kind of algorithm for each (as @marcoscaceres's code also demonstrates):
And then you can define higher-level helpers around that, which perform whitespace normalization, range checks, lone surrogate replacement, array to list concept mapping, etc.
I added a new attempt to fix this in #753 (This is the "low tech" approach of just adding some fairly non-formal language around behaving differently in manifest parsing than the normal WebIDL spec allows).
I'm less in favour of this than #750 (which would require upstream changes to WebIDL). Because if we did heycam/webidl#597 + #750, then we implementations could easily follow along with these changes and the manifest parser implementations could literally use the IDL parser. With #753, I think it specifies what we want, but it's impractical to implement it without changing the implementation's WebIDL parser.
(Of course, there are other discussions about reverting the use of WebIDL altogether, but in my opinion, I would rather keep it.)
I'd like to pick this up again and rewrite the spec in terms of ECMAScript.
@mgiuca, would you be ok with that?
Ok, we now have a means to convert JSON into infra types: https://github.com/whatwg/infra/issues/159#issuecomment-491338205
My plan is to rewrite the spec to use the above and operate on the infra types.
Can we still get something like the IDL overview at the end?
No, but we can just improve the TOC.
(alternative, we have a non-normative JSON Schema we can link to... I think we already have that)
It looks like there are more use cases which want an ImageResource
IDL dictionary (e.g. Content Indexing, see https://github.com/rayankans/content-index/issues/2, and Background Fetch, see https://wicg.github.io/background-fetch/#background-fetch-manager). Maybe that should be spun out of Manifest, which then reduces the need to define Manifest itself in terms of IDL?
There's also discussion on the image selection front in https://github.com/w3c/manifest/pull/710 (@jakearchibald). Is there a way to move forward here @marcoscaceres @mgiuca?
@rayankans FYI
@marcoscaceres Do you think you'll proceed with converting this to a JSON schema type thing (and not IDL)? We have clients trying to use our ImageResource
dict which means now is a good inflection point to decide either a) it's OK to use dicts from Manifest in other parts of the web platform, or b) no, these are specific to the Manifest format and can't be used in JS APIs.
I still think that being able to use them in other JS APIs is a good thing (reuse of those definitions, instead of having to define parallel dicts in two different languages), but if you want to change over to another language, we should do so before others start depending on it.
(I'm also hesitant about this specific usage, in Background Fetch, since the ImageResources used there are unrelated to the ones we use in Manifest.)
@marcoscaceres Do you think you'll proceed with converting this to a JSON schema type thing (and not IDL)?
Yes
We have clients trying to use our
ImageResource
dict which means now is a good inflection point to decide either a) it's OK to use dicts from Manifest in other parts of the web platform, or b) no, these are specific to the Manifest format and can't be used in JS APIs.
We should keep ImageResource
as a thing for APIs tho (we've tried to do this previously also with @jakearchibald). But we can pull that into a new spec if needed.
Ok, first step: I've sent a PR to include the JSON schema into the spec https://github.com/w3c/manifest/pull/928
In 32b497c4ca34c9069aa5e545126c252724f370fe, I (finally) got rid of the IDL and converted the "processed manifest" to an (Infra) ordered map.
There isn't really a formal specification of the manifest data structure. Instead, each member has its own section which describes the format of that member in prose. This makes it hard to discuss (e.g., in Ken's shortcut proposal, we'd like to be able to say "
sequence<IconInfo> icons;
" but there's no name for "the dictionary that describes a Manifest icon" because it's described in prose).I think we should add a WebIDL section that contains a dictionary definition for the full manifest, as a first step (I can do it).
As a second step, I think this would allow us to simplify the language in each individual member. The "steps for processing a manifest" algorithm could be replaced with "Parse the JSON into an IDL dictionary value, then convert it into an ECMAScript object" (which automatically applies all of the IDL type conversion rules).
Then, for example, this text:
could be replaced by this (assuming
short_name
is aDOMString
):Were it not for the call to Trim, this would make almost all the processing steps trivial. Unfortunately, almost all of them do Trim their strings, which makes things more complicated. Perhaps we could make a global rule that Manifest strings are trimmed, then we wouldn't really need any processing steps at all.