tc39 / ecma262

Status, process, and documents for ECMA-262
https://tc39.es/ecma262/
Other
14.97k stars 1.28k forks source link

Editorial: ambiguity of "collection of either A or B" #3004

Open acutmore opened 1 year ago

acutmore commented 1 year ago

Spun off from https://github.com/tc39/ecma262/pull/2777#discussion_r1089707394


Working on a PR that had "a List of either Objects or Symbols" prompted the following conversation with ChatGPT:

Me: In the phrase: "A bag of either apples or oranges" does this mean that the bag can contain either only applies or only oranges? Or does it mean that the bag can contain a mix of both apples and oragnes?

AI: The phrase "A bag of either apples or oranges" means that the bag can contain either only apples or only oranges, not a mix of both.

Me: are you sure?

AI: It is possible that the phrase "A bag of either apples or oranges" is being used to indicate that the bag could contain a mix of both apples and oranges, however, without more context it is impossible to know for certain. The phrase "A bag of either apples or oranges" is often used to indicate that the bag can only contain one type of fruit, either apples or oranges.

The intention of the PR is that the possibly empty list may contain both objects and symbols and may also contain only one of those types.

@jmdyck pointed out:

The status quo isn't entirely consistent on how to describe such lists:

4 times a List of either X or Y:

  • 16.2.1.6.2 (return type): a List of either Strings or null
  • 22.2.7.8 (type of indices): a List of either Match Records or undefined
  • 22.2.7.8 (type of groupNames): a List of either Strings or undefined
  • 29.5.4 (type of Ws): a List of either WriteSharedMemory or ReadModifyWriteSharedMemory events

1 time a List of X or Y (but see below re ambiguity):

  • 29.5.5 (assertion on Ws): a List of WriteSharedMemory or ReadModifyWriteSharedMemory events ...

2 times a List ... whose elements ...

  • 22.1.3.18.1 (type of captures): a possibly empty List, each of whose elements is a String or undefined
  • 29.6.2 (condition): a List ... whose elements are WriteSharedMemory or ReadModifyWriteSharedMemory events ...

2 times break it out as a separate sentence (which isn't much of an option in the above contexts):

  • 6.1.7.3 ([[OwnPropertyKeys]]): The Type of each element of the returned List is either String or Symbol.
  • 10.5.11 (Note re [[OwnPropertyKeys]]): The Type of each result List element is either String or Symbol.

comment

Being clear and consistent about category of type this will help with automated type analysis of the spec.

Personally I quite like the idea of: a List, where each element is either a T1 or a T2.

Thoughts?

ptomato commented 1 year ago

I've run into this and other ambiguities in the prose expressions of types described in structured headers several times. I'd bet the choice of using prose for types has cost us collectively a lot of time in thinking, nitpicking, discussing, and implementing rules in Ecmarkup. At what point is it worth it to switch to a precise, non-prose notation, such as List<Object | Symbol>? The exact choice of notation aside, this is what we likely have in mind anyway while we struggle to find a way to describe it in prose.

jmdyck commented 1 year ago

Some history: The spec used to have a less-prosey style that was (inconsistently) used for the declared type of record fields and internal slots. E.g., instead of an Object or *undefined*, you might see Object | *undefined*.

In PR #545, I suggested using this notation for the types of AO parameters + returns. @bterlson liked the idea, but the eventual decision was to use "assertion-style prose, rather than using an ad-hoc 'type signature'". (One benefit of this was that the ecmarkup processor could relatively easily generate a prosey preamble similar to what the spec already had.)

After the merge of PR #545 (and of follow-up PRs that spread structured headers to other kinds of operations), assertion-style prose became the dominant syntax for expressing 'types', so in PR #2602 and PR #2691, I used it to replace the less-prosey style mentioned above for declaring record fields and internal slots.

I think the only remnant of the less-prosey style is the notation for method signatures in the Essential Internal Methods table, which has been there since ES5.