WebIDL, yea or nay? - Githubissues

rdeltour commented 5 years ago

The Publication Manifest specification relies on WebIDL to define the internal data structure of manifests.

One of the reasons we chose WebIDL is that we were influenced by its use in Web App Manifest. Now, it seems that the Web App Manifest folks are reconsidering that, see the entire discussion in w3c/manifest#611. (Thanks @marcoscaceres 🎂 for hinting at this discussion at TPAC).

The TAG also commented on our use of WebIDL, to which @iherman replied.

Are we sure that WebIDL is the right approach and won't create more future issues than it solves? (I’m not an expert in WebIDL myself, and still need to digest the issues raised for Web App Manifest).

In the context of the TAG issue "the proliferation of manifests at W3C" by @tantek, I believe that things like how to describe data structures is typically worth being looked into, and see if these specs can or need to adopt a unified approach.

I know it’s late in our editing process, and we’ve had lengthy discussions on using WebIDL, and even more time spent on actual editing. But we'd better make sure we're on the right track before moving to CR.

iherman commented 5 years ago

My answer that you quoted still stands. Although the subsequent discussion in that thread was diverted into issues about the usage of WebIDL and not whether to use WebIDL or not (which also led to substantive changes), the remark I made there:

All that being said, I understand and share your unease about the usage of WebIDL; I think we would be happy to consider an alternative. We just did not find any...

still stands.

Although I admit I only had a cursory look https://github.com/w3c/manifest/issues/611 (it is a very long thread) I did not see any alternative emerging either. At some point I raised the (half serious) idea of using TypeScript for the same purpose; I guess it would be perfectly readable and concise (I actually did, for my own learning, a version of the data structure definition), but we also agreed that it would not be wise to bind to an existing programming language.

That being said: if there is an accepted formalism coming to the fore in the coming months to replace WebIDL, switching to it should be an editorial change and not a substantive one. If that is that case, I think it would be acceptable to switch to it while we are in CR. I agree we should watch this space but, in my view, we should not consider this issue as a road block.

mattgarrish commented 5 years ago

We don't rely on WebIDL in quite the same way that web app manifest does (i.e., they have webidl written into their definitions and processing). I can see why they'd want to refactor all of that out. Whether we use WebIDL is, in some ways, largely irrelevant to our specification since it's just a general reference to how the processed data gets structured internally; it's swapped in and out with anything else quite easily. (Our processing steps only use pseudo code with JSON examples, after all.)

I'm fine with alternatives, but given there is no consensus whatever we pick has an uncertain future. We might just want to punt on this and see if the landscape changes before we move on from CR - swapping in something else at CR wouldn't materially change our specification, so doesn't seem like it would be a controversial change.

One strong argument against WebIDL, though, is that it gives the appearance of an API to our specification, where one doesn't exist (something we do try to clarify, though).

One tangential concern I've had growing in my mind lately is the prominence of the WebIDL. Given that it's just a reference for developers, having it so high in the spec may lead to authoring confusion -- i.e., what is required in the internal representation not always matching up with what is required to be authored. I don't believe in the ability of readers to not just look at the WebIDL and expect that it describes what has to be in their manifests. It might be better to return it to an appendix.

mattgarrish commented 5 years ago

It might be better to return it to an appendix.

Or perhaps moving it into the processing section might be the most appropriate place.

rdeltour commented 5 years ago

Although I admit I only had a cursory look w3c/manifest#611 (it is a very long thread) I did not see any alternative emerging either.

It seems that they have a plan to describe the structure using the types from the Infra standard.

That being said: if there is an accepted formalism coming to the fore in the coming months to replace WebIDL, switching to it should be an editorial change and not a substantive one. If that is that case, I think it would be acceptable to switch to it while we are in CR. I agree we should watch this space but, in my view, we should not consider this issue as a road block.

Good! This at least makes me feel a bit less guilty of having raised this issue 😅

mattgarrish commented 5 years ago

I think it would be acceptable to switch to it while we are in CR.

Heh, I didn't even notice we were saying the same thing. That's reading on the weekend for you... :)

iherman commented 5 years ago

It might be better to return it to an appendix.

Or perhaps moving it into the processing section might be the most appropriate place.

I just wanted to make this proposal:-)

mattgarrish commented 5 years ago

to describe the structure using the types from the Infra standard

We might want to go all in on infra for the processing. We use the general language, but fall back on some loosely defined concepts.

It might not be much harder than adding the conversion to infra types step and switching /object/Map/ and /array/list/.

iherman commented 5 years ago

But, as far as I can see, infra is trying to unify the processing step language. Which is very useful. But it does not give a general view of the data structure like the current WebIDL does

marcoscaceres commented 5 years ago

But, as far as I can see, infra is trying to unify the processing step language.

It also defines general types... and how to convert JSON into those types.

Which is very useful.

Indeed :)

But it does not give a general view of the data structure like the current WebIDL does

It doesn't provide a syntax for defining those structures. But it does give the data types.

iherman commented 5 years ago

I was actually wondering about using, simply, the original OMG IDL. After all, this is at the basis of WebIDL but, if we use this, we take away the ambiguities around the fact that we do not define any API, the data structure can be used by a Web processor but, also, by something else, etc. On the other hand, it is not a big departure from WebIDL.

I have not fully absorbed the spec, but I noticed one thing. What OMG IDL calls char (and strings, that consist of char-s) are strictly 8-bit, essentially ASCII characters. No good for us. It also has, however, "wide" characters, called wchar (and, consequently, wstring). It does not really say what a "wide" character is, but it can consist of several bytes, i.e., it could be used to store Unicode code points encoded in, e.g., UTF-8. Unfortunately, the spec does not refer to Unicode at all (for a spec that has been updated in 2018, it is a bit surprising). This is a detail we must check if we go down that route.

mattgarrish commented 5 years ago

But it does not give a general view of the data structure like the current WebIDL does

It doesn't provide a syntax for defining those structures. But it does give the data types.

Ya, I'm not suggesting we drop the webidl, at least not yet. All I'm suggesting is that for the processing steps we use the infra spec more completely.

For example, instead of these two steps to parse the json:

Let manifest be the result of parsing text as JSON [ecmascript]. If parsing throws an error, this is a fatal error. Return failure.

If typeof(manifest) is not Object [ecmascript], this is a fatal error. Return failure.

By the infra spec we could instead use something like:

Let manifest be the result of parsing JSON into Infra values given text.

If manifest is not a map, this is a fatal error. Return failure.

After that, there's only a few instances where we use different data type names, which we assume are widely understood but wouldn't hurt to tie to the infra datatypes instead.

iherman commented 5 years ago

I am fine rewording the steps if it is not tooooo much trouble (I do not know how stable infra is). But this looks like orthogonal to the original issue.

dauwhe commented 5 years ago

In the context of the TAG issue "the proliferation of manifests at W3C" by @tantek, I believe that things like how to describe data structures is typically worth being looked into, and see if these specs can or need to adopt a unified approach.

I worry about the proliferation of manifests, too. I've been experimenting with manifests that are both pub manifests and web app manifests (with link rel="manifest publication") At least online web app manifest validators don't seem bothered by all our contexts and extra information—they just return the members they recognize. But I haven't tried to get such a manifest to actually install on Android.

I also think it is important that ordinary web developers to be able to do useful things with our manifest. Having things defined in terms of infra might help?

marcoscaceres commented 5 years ago

I do not know how stable infra is

Very. It sole purpose is to be the bedrock on which other standards are built.

marcoscaceres commented 5 years ago

I also think it is important that ordinary web developers to be able to do useful things with our manifest. Having things defined in terms of infra might help?

I think this conflates separate concerns. Infra just gives us generic data types. We still need for those things to be processed in a logical way into some canonical form.

I think I need to do the conversion to Infra with web manifest to show how this works, then the pub specs can leverage the data processing algorithms to piggyback on-top of web manifest. That is, assuming the pub spec can be used on top of web manifest.

iherman commented 5 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
WebIDL yea or nay?
Romain Deltour: this is mostly an editorial issue?
… after talking to marcos at TPAC
… for web app manifest spec they are going to stop using WebIDL
… they think it’s a poor fit for data structures
… there’s an issue in their github
… so I wondered if we should align with what they are doing
… and there’s also a TAG issue about the proliferation of manifest formats
… so I think it’s good we use the same kind of specification process
Ivan Herman: for those who did not read the thread
… there are two issues
… one, the community is coming up with an agreement to use similar language when describing processing for these things
… matt has looked at it, and we could adopt this language
… the other problem is that those approaches do not provide one place where you can look at the whole data structure
… and WebIDL does
… nobody really likes webidl, neither do I
… but I don’t have an alternative
… if an alternative comes up, then we can use a new things
… but I think we should go ahead with CR
Romain Deltour: a couple things
… since we first discussed things, the specification landscape evolved
… we now know how to parse JSON into infra parts
… and that’s what the app manifest folks will use
… the question to matt, our use of webidl
… we don’t rely on it in the same way, I’m told, and I don’t understand
Matt Garrish: what I’m getting at is that it’s integrated–they define via their dictionary
… we don’t do that–our properties are not defined via webidl dictionaries
… and our processing is separate
… so we’re independent, it’s just a way of visualizing our data structure
… what they’re going to do in WAM is to take the interleaving of the webidl, and depend less on that and more on infra
Ivan Herman: if I remember well, use of WebIDL is not only for data structure but for functions
Dave Cramer: I’m a little puzzled by Ivan saying that there is no alternative for WebIDL but that is only as a way to visualize the data structure? I am confused that Matt says we’re defining it in the spec but we depend on it as expressing vocabulary?
Ivan Herman: it would be equivalent to using Typescript interface
Dave Cramer: And where do things like JSON schemas fit into this question?
… I don’t know what other kind of formalism would I use
Ivan Herman: json schemas are orthogonal to this
Romain Deltour: one of the similarities we have with app manifest
… the result of processing the manifest is what we define with webidl
… so this algo defines a string is converted into webidl
… I’m not an expert in webidl, and I’ve only read their issue a couple of times
… the issues are in defining the details of the conversion logic
… and then we have to define the conversion of json to web idl
Matt Garrish: I wouldn’t say that what we’re processing is intended to be webidl
… it was canonical json
… we’ll probably end up with infra types
… which is problematic
… and the fact that it gives the impression we have a web api
… I don’t know what else there is that easily describes the objects that are going to be created
… possibly the solution is that we don’t define the webidl
… and web devs figure it out
Romain Deltour: it does seem that it’s mostly an editorial issue? It doesn’t affect normative language?
… we can deal after CR?
Wendy Reid: can we call this postponed?
Ivan Herman: I would love to have a replacement for WEbIDL
Romain Deltour: in some ways it’s editorial, on the other hand it’s a big thing to change how things are described
… it’s a profound editorial change; not lightweight
… the sooner we know the better
Ivan Herman: yes, it is a profound editorial change
… I don’t see what we are changing for
… and I don’t see alternatives
… I’ve looked at many things
… using typescript is not a good idea
… it’s weird that there’s no standard formalization for data structures
Romain Deltour: one thing worth clarifying before CR, is to clarify what is the type of the result of processing algo
… we say we convert to an internal representation, but we don’t say what the internal representation is
… what does the algo produce? a json object?
… we should clarify that?
Matt Garrish: that’s part of what I hope to clean up with the infra types
… the normalization is more complicated than I thought
… our outcome will be an infra map
… if it’s purely about visualizing the data, maybe it doesn’t belong in the spec, maybe it should be in a wiki
… so we have the processing steps and then a separate numbering
Wendy Reid: could we see a PR with the infra in it?
Matt Garrish: it’s coming
… I hope it’s done later today

iherman commented 5 years ago

Another insane idea...

The reason we have WebIDL is to have a programming-language independent, but easily graspable overview of the data structure that is generated. I believe it is important to have this. The problem using something like TypeScript is that it is a specific programming language and could be misunderstood.

However, what about embracing the TypeScript option and put, side-by-side the same data structure in other programming languages that do have typing. So we could put into an informative appendix the same data structure in TypeScript, Rust, Java... I do not know about typing in Swift; unfortunately, Javascript or Python would not qualify because we cannot really express types for those. But if we have at least those three, this would take care of the possible misunderstanding of a single language, and we could drop WebIDL.

I know it is insane... but maybe it works nevertheless

marcoscaceres commented 5 years ago

The reason we have WebIDL is to have a programming-language independent, but easily graspable overview of the data structure that is generated.

I think this was an original goal, but now it defines how data between JS and C++ (and maybe Rust a tiny fraction of the time) pass data between each other in a somewhat secure, type-coercing, error-handling manner... amongst other things.

But if we have at least those three, this would take care of the possible misunderstanding of a single language, and we could drop WebIDL.

I think we may be getting ahead of ourselves here. I think we need to answer: Who, exactly, is supposed to process the data in this specification?

If it’s a text editor, then JSON-Schema or maybe TypeScript might be very useful. If it’s a browser (and the data never ends up in a JS environment), then Infra types are best, for instance.

Let’s start by answering the question above. What is the primary conforming user agent you are targeting?

As an example, note that Web Manifest links to a non-normative JSON-Schema that was created for use with Visual Studio.

mattgarrish commented 5 years ago

Let’s start by answering the question above. What is the primary conforming user agent you are targeting?

There isn't a single answer to that question, which is what complicates a lot of our decision making. It could be a browser. It could be a JS-based reading app. It could be a standalone reading app.

We define an informative json schema, as well, for authoring, but there's greater flexibility in authoring so it doesn't show the expected internal representation (not clearly, anyway).

A major chunk of processing is normalizing the data so we can be flexible to the kinds of patterns people use with schema.org metadata (i.e., not following strict typing). The idea being that you don't have to define SEO metadata separately from the manifest metadata.

I'm still partial to presenting this kind of data visualization aid outside the specification. It may be helpful, but when it's not critical and causes confusion it's probably not worth the hassle.

iherman commented 5 years ago

I think this was an original goal, but now it defines how data between JS and C++ (and maybe Rust a tiny fraction of the time) pass data between each other in a somewhat secure, type-coercing, error-handling manner... amongst other things.

Yep, that is the problem we are fighting with: WebIDL has outgrown that original usage, hence the awkwardness of using it here.

To answer to your question, @marcoscaceres: the processing of the data is supposed to be done by what this community calls a "reading system". That can be a separate application rendering audiobooks (or other forms of publications), can be a plugin in a browser, or the browser itself (what was the case, for EPUB, in the now defunct Edge functionality on EPUB-s).

Our fearless editor, @mattgarrish, is working on transforming the text to use the terminology/style of infra. But I still believe that something like (the original:-) WebIDL would be useful, if for nothing else to make the spec, and the "target" of the processing step more understandable to whoever reads the spec. Hence my (insane:) proposal to possibly use several examples on how that data structure is represented in specific languages (that would not be normative).

B.t.w., we also have a json schema for the manifest...

iherman commented 5 years ago

This issue was discussed in a meeting.

No actions or resolutions
View the transcript
update manifest processing to use infra types
Garth Conboy: See Issue #101
Garth Conboy: See Issue #98
Ivan Herman: See PR #103
Garth Conboy: There are 2 issues in this pull request - this is also one that Ivan and Matt have…
… Does anyone on the call want to run through this and hope to get to a consensus?
Ivan Herman: The work was done by Matt.
… I would prefer Romain to comment on this as he was the one initiating this whole work and can give a more appropriate background.
Romain Deltour: Basically it originated from trying to get rid of the webIGL as a way to describe a datastructure. In trying to harmonize the various manifests in the W3C, we wanted to use infra-types to represent the data structure. Matt’s editing the spec to replace webIGL for infra types.
Ivan Herman: See Infra standard
Romain Deltour: The WebIDL representative is moved to informative edits. A few different issues were raised when we looked at the algorithm. I think it’s clearer now, but we need more reviews there. It will be interesting to have more.
Garth Conboy: Looking at the PR, Romain you have one requested change outstanding and Ivan has approved it.
Ivan Herman: I am now repeating what Dave said earlier on the other PR. This was a pretty large change that made the document better and Matt did great work. Now it becomes difficult to discuss other issues and details with this as a PR.
Romain Deltour: I think my requests have been addressed in Matt’s latest changes
Ivan Herman: I would be in favor of merging this ASAP (as soon as Matt is comfortable) then we can comment and modify once it’s in. Again, this is something he should do…
Romain Deltour: +1 on merging and doing more reviews/changes based on Matt’s work
Garth Conboy: Sounds good for me. If there are subsequent changes, they will be smaller. Barring objection, why don’t we go ahead and put that as a request for Matt…

iherman commented 5 years ago

Closed by virtue of the merge of PR #103

w3c / pub-manifest

WebIDL, yea or nay? #98