whatwg / webidl

Web IDL Standard
https://webidl.spec.whatwg.org/
Other
410 stars 164 forks source link

No seemingly consistent Shepherd story #240

Open annevk opened 7 years ago

annevk commented 7 years ago

The way Bikeshed annotations are used is rather weird.

Some types are exported (enumeration types), some are not (dictionary types, callback function types).

Some types are labeled dfn, some are interface, some are dictionary. E.g., sequence is an interface, but record is a dictionary.

Some types are co-located with their values (records, it seems like), others are in different places (dictionaries).

What's the strategy here?

@tabatkins any advice?

tobie commented 7 years ago

Whatever we agree on should be documented in the README.

tabatkins commented 7 years ago

Some types are exported (enumeration types), some are not (dictionary types, callback function types).

Recall that nothing needs to be explicitly exported except for "dfn" type definitions. The rest export implicitly unless you use noexport.

Some types are labeled dfn, some are interface, some are dictionary. E.g., sequence is an interface, but record is a dictionary.

This is a little hand-wavey. Unless there's a better definition type (like dictionary), I prefer everything that defines a type name to be interface, just for regularity. But "record" and "sequence" don't really do that; they're not really IDL names, they're closer to IDL keywords, like the "interface" in interface Foo. You wouldn't ever write {{record}}, either. I'd probably mark up "record" and "sequence" as dfn, same as the keyword "interface" is.

annevk commented 7 years ago

@tabatkins dictionaries, sequences, and interfaces are all types though. I don't think you can say one is a name and another is syntax. (Not really sure what an IDL name even is.)

tobie commented 7 years ago

Yeah--I've found the mapping to be exceptionally challenging when doing the conversion to Bikeshed. Happy to help fix this once we settle on something that seems clearer.

tabatkins commented 7 years ago

dictionaries, sequences, and interfaces are all types though. I don't think you can say one is a name and another is syntax. (Not really sure what an IDL name even is.)

By "names", I'm referring to actual types. record isn't a type, record<Foo> is; record is a syntax keyword used to construct a type. (In Haskell terms, it would literally be a type constructor, which is a first-class value of a higher kind than types.) So the record keyword should be marked up as a "dfn" type definition, same as the interface keyword is.

Yeah--I've found the mapping to be exceptionally challenging when doing the conversion to Bikeshed.

Yeah, this is a particularly meta situation, it's not surprising that confusion abounds. ^_^

annevk commented 7 years ago

I'm not sure that's really helping.

We have record<Foo> in syntax. Whenever a value is passed to an algorithm that uses that syntax in its IDL description, that value is converted into what we call a "record". Same as sequence<Foo> is syntax and passing a JavaScript Array to an algorithm that uses the sequence syntax results in a "sequence". They are not just keywords, they are not just types, they are also used to refer to instances.

Currently IDL defines "record type" and "record" as things and does various other things for other types. It's still not clear to me what the strategy is for getting this all straightened out.

tabatkins commented 7 years ago

What I'm saying is just that the word "record" is not one of the IDL definition types. Those are for things defined with IDL, not the concepts of IDL. In particular, the concept of a "record" is not an interface that is defined using WebIDL.

(Technically neither are the primitive types, but I don't think it's sufficiently worthwhile for me to create a "primitive" definition type; we can just lump them in with interfaces because they act basically the same for our purposes.)

(If I could do it over again I wouldn't have created separate types for interface/dictionary/other types that you can use in an arglist, as the point of separate types is to carve out separate namespaces, and those really live in the same namespace; you're not allowed to create a dictionary and an interface with the same name. I might still go ahead and merge them as aliases, but I need to develop some machinery to handle definition aliases first.)

Given the above asides, the fact that "record" isn't an interface is even more obvious - it's 100% okay for me to write interface record { ... } in a spec. Nothing breaks at all. Thus, the "record" dfn currently in the WebIDL spec can't be interface type, by definition - if it was, it would be preventing me from defining my interface. It's just a dfn type.

Same with "record type" - it's a dfn.

Nobody's trying to write <dfn>record&lt;Foo></dfn> yet; if they do we'll address that, but for now it's not a relevant part of the topic.

annevk commented 7 years ago

Thanks, that helps a lot! So I guess a remaining question I have is whether we should have separate definitions for types and instances. Generally when we define concepts we don't draw that distinction. E.g., the URL Standard defines URL records. It doesn't define a URL record type.

And the new Infra Standard defines things like lists, maps, etc., but not list type, map type, etc.

@domenic @bzbarsky thoughts?

If we settle the type/instance question somehow I think I'm able to drive this to some kind of conclusion (and be finally able to land a patch to Fetch).

domenic commented 7 years ago

I think it's fine for IDL to define types like "record type" or "long long".

domenic commented 7 years ago

Reviewing #whatwg there seems to be a lot of confusion as to why "record types" and "dictionary types" and such are defined. Let me try to spell it out, as @bzbarsky has done previously.

Web IDL defines a type system. In a type system there are at least two distinct entities: the types themselves, and instances of the type. The easy example is long long is a type, and 5 is an instance of the long long type. (More precisely: "the IDL long long value that represents the same numeric value as 5 is an instance of the long long type." Less precisely: "5 is a long long".)

Web IDL also conveniently groups certain types into categories. The most obvious of these are "numeric types". But there are also a bunch of other categories, listed in https://heycam.github.io/webidl/#idl-types. Finally there are certain categories which you might not realize as categories, such as "dictionary type" and "record type" and "interface type" and "promise type" and "sequence type". See all the entries under https://heycam.github.io/webidl/#idl-types in the table of contents that end in "types".

For example, the interface declaration interface Foo { } creates the type "the interface type Foo", often abbreviated to just Foo. This is an interface type. An instance of the interface type Foo, often abbreviated to "a Foo instance" or "a Foo", is an instance of the type. This is exactly analogous to the long long + 5 situation above.

The same goes for dictionaries and records. Each dictionary or record declaration creates a specific dictionary or record type. Dictionary types are given a name like dictionary Foo {} and records are given a name like record<K, V>. Any given dictionary or record will be an instance of a specific dictionary or record type.

The same goes for promise types and sequence types and nullable types and.... The sequence type sequence<long> is distinct from the sequence type sequence<Node>. Any given sequence will be an instance of one sequence type.

This might be confusing because of the abbreviations we sometimes use (of which I've mentioned several above). But the actual theoretical foundation is well-founded and straightforward. I'm not sure what any of this has to do with Bikeshed, but I don't see why we would try to abolish the distinction between types and instances, or remove the <dfn>s from one category or another. Especially while writing the Web IDL spec, it's extremely useful to refer to the types separately from the instances, as you can see from a quick review of the spec's algorithms.

annevk commented 7 years ago

What I'm saying is that everywhere else in standards we seem to be using terminology just like "5 is a long long". E.g., "« y, z » is a list". I don't like the inconsistency and I think we should either define a type system and consistently apply that everywhere (probably driven by Infra) or use the pattern that has worked well thus far outside of IDL.

domenic commented 7 years ago

I guess I just have to disagree then. There's no inconsistency between my above outline and those two terminology uses.

annevk commented 7 years ago

Disagree with what? Do you agree that we don't have a dfn for "list type" or "URL record type"?

domenic commented 7 years ago

We indeed don't, because talking about them is not useful. In IDL, it's very useful to talk about the type system, to group the types together, to contrast them, etc. I disagree that we need consistency here.

annevk commented 7 years ago

There's two reasons why I think it matters:

  1. I'd like them to converge. I don't think we need both sequence and list. I don't think we need both record and map.
  2. Once we start formalizing internal slots, and potentially having some syntax for them, we're going to get even more convergence as these internal slots can hold URL records and such.
domenic commented 7 years ago

I don't think we need both sequence and list. I don't think we need both record and map.

I strongly disagree. We need separate specification types, which can contain vague things, and IDL types, which can only contain things from the IDL type system.

Once we start formalizing internal slots, and potentially having some syntax for them, we're going to get even more convergence as these internal slots can hold URL records and such.

Yes. This seems unrelated to the discussion here.

annevk commented 7 years ago

I strongly disagree. We need separate specification types, which can contain vague things, and IDL types, which can only contain things from the IDL type system.

Why though? Neither enforces the constraints and both operate in the same way.

This seems unrelated to the discussion here.

Not really. I'm saying we need a general strategy since there will be convergence over time. You're saying it's fine to have separate strategies for IDL and the rest of the ecosystem.

domenic commented 7 years ago

Neither enforces the constraints and both operate in the same way.

The constraints are enforced by the Web IDL type system! Yes, people can write specs that violate the Web IDL type system. But then it's just an invalid spec that's not implementable in browsers. In contrast, an Infra list can contain things that aren't Web IDL types, or multiple things of different types, or...

You're saying it's fine to have separate strategies for IDL and the rest of the ecosystem.

In general it's fine to have separate strategies for separate ecosystems. E.g. JavaScript, IDL, Dart, WASM, and spec-land all can have different type systems, and in each of them we can use the terminology we find most useful. There's no gain in trying to shoehorn everything into the same mold.

annevk commented 7 years ago

The constraints are enforced by the Web IDL type system!

Only at the boundaries (and that would continue to be true if we just use lists). If you create a new sequence in an algorithm step its items are not typed whatsoever.

In general it's fine to have separate strategies for separate ecosystems.

I guess I don't really those as separate. And as I said, I see them converging.

domenic commented 7 years ago

If you create a new sequence in an algorithm step its items are not typed whatsoever.

That's just not true. It's not possible to create a sequence without choosing which sequence type you are using. And then any manipulation of it must follow the rules of the type system.

annevk commented 7 years ago

From HTML:

When the getLineDash() method is invoked, it must return a sequence whose values are the values of the object's dash list, in the same order.

There's no type there.

domenic commented 7 years ago

Yes there is; it's unrestricted double.

annevk commented 7 years ago

Again, that's at the boundaries. Nothing in prose indicates that and that is where the sequence is created. If we used list there instead it wouldn't matter.

tobie commented 7 years ago

Adding @tabatkins's comments on irc below:

The distinction y'all are looking for is the notion of "kinds" from type theory. Values are the lowest kind, types are the next highest (they can be instantiated to produce a value), and type constructors are the next highest after that (they can be instantiated to produce a type). The kind hierarchy is infinite, of course. If values are kind 1, types are kind 2, etc, then "dictionary" or "record" are kind-3 things. The IDL definitions types in Bikeshed are meant to capture kind-2 things. There's nothing official for kind-3, because there's no call for it, so they instead fall into the generic "dfn" bucket. Then you've also got the kind-2 things like "sequence<DOMString>". They're definitely kind-2, types, because when they're instantiated they produce values (an array of strings, in this case). They're just anonymous types, which aren't handled by the Bikeshed linking model. As far as I can tell, you don't really need it, either. You instead want to link to "sequence" (the kind-3 thing) and "DOMString" (the kind-2 thing). If you use typedef to give a name to it, tho, like "typedef sequence<DOMString> NameList;", then "NameList" is a kind-2 thing, a type, with a name that can be referred with Bikeshed's linking model.

tabatkins commented 7 years ago

@tobie You'll want to properly code-span some of those bits; there's several "sequence"s in there that are actually sequence<DOMString>, and losing that bit confuses my entire point. ^_^

tabatkins commented 7 years ago

I think it's fine for IDL to define types like "record type" or "long long".

Note that "long long" is a type (kind-2), but "record type" is not - it's a kind-3 type, and instances of it (like record<DOMString, DOMString> are kind-2 types. That's the exact distinction I was drawing - all the IDL definition types are for kind-2 types, but the confusion this thread is talking about is how to mark up the kind-3 types.

I make no judgement on whether or not it's useful to refer to the kind-3 types; I suspect it is. The point of this thread was just to establish what definition type they should have, and if possible establish generic rules for determining this sort of thing. I think that's been accomplished.

tobie commented 7 years ago

@tabatkins fixed. sorry.

annevk commented 7 years ago

I don't think record<DOMString, DOMString> is a type. It's just a coercion strategy at the boundary. The resulting record names and values will be a DOMString, but there's nothing that guarantees that will remain to be the case. Prose has to do the due diligence there.

domenic commented 7 years ago

Anne, what you're proposing would be a complete overhaul of Web IDL, which has a strong foundation in type theory right now and relies on the guarantees it provides during all of its algorithms, not just at the boundary. The whole point of Web IDL having IDL -> JS conversion rules is because it's converting from one well-defined type system to the other. Without that the whole enterprise becomes ill-founded.

bzbarsky commented 7 years ago

but there's nothing that guarantees that will remain to be the case

There certainly is in implementations: the actual thing representing that record is typed as a map from DOMString to DOMString.

I thought that this was also the case for spec-land in the sense that adding something other than a DOMString->DOMString mapping to such a record was an invalid thing to do, just like adding a Document to a sequence<Element> is an invalid thing to do. But it sounds like you're arguing that the latter is somehow OK too?

annevk commented 7 years ago

It's certainly not something you should do, but as far as I can tell there's nothing that prevents it. We don't define operations for manipulating sequences and records and such that ensure these things.

We typically do this kind of thing by convention. E.g., we have "ASCII string" for a string where we "guarantee" to not put anything in that is outside the range U+0000 to U+007F. You can implement that as a typed thing, but the specification doesn't really. It just says it's an ASCII string and treats it like a string for all intents and purposes.

Anne, what you're proposing would be a complete overhaul of Web IDL, which has a strong foundation in type theory right now and relies on the guarantees it provides during all of its algorithms, not just at the boundary.

There's nothing that would observably change, as far as I can tell. And we'd end up with less concepts and more reusable operations.

bzbarsky commented 7 years ago

but as far as I can tell there's nothing that prevents it.

Well, yes, but...

We don't define operations for manipulating sequences and records and such that ensure these things

If we did, using those would be just something someone "should" do. Nothing would prevent a spec from directly poking the sequence/record guts and putting the wrong thing in there, right?

In other words, at some point we always bottom out in "spec authors should not do the things they should not do". We just need to make it clear what those things are.

annevk commented 7 years ago

The whole point of Web IDL having IDL -> JS conversion rules is because it's converting from one well-defined type system to the other.

I haven't suggested anywhere we should drop the conversion rules.

I've asked two things:

  1. Whether we should be more consistent in how we talk about "kinds" throughout platform specifications. I think it's confusing that the same prose in an IDL-interface-member algorithm can make the distinction between record types and records, but has no such way of talking about URL records.
  2. Whether we should consolidate lists and sequences, maps and records, and similar things. That sometimes you deal with a list of strings and sometimes with a list of elements is currently just something we state somewhere. Personally I think that's enough and allows implementations to make the same kind of implementation decisions as we provide with annotations such as ASCII string. I don't think we need to go all the way and define specific append/remove operations for sequences/records/etc.