qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

LES3: Backquoted and single-quoted unary operators #106

Closed qwertie closed 4 years ago

qwertie commented 4 years ago

I think it's important to be able to express that expressions have units, so I've spent some time thinking about a notation for expressing them. For a long while I was thinking about using the single quote to begin one:

dist = speed 'm/s * time 's

The problem is that both numbers and identifiers can already contain apostrophes, so this notation requires either (1) a space before the apostrophe or (2) disallowing apostrophes in numbers and identifiers. I really like the Haskell-like apostrophe notation table' instead of table2, so I wanted to avoid option 2. Besides which, there is often a need for a space after the "operator" if punctuation is included in the operator as shown here. Eventually I realized I could use backquoted strings to denote units:

dist = speed`m/s` * time`s`

This happens to be the same notation I used for units in my unit inference engine for Boo back in 2006. I guess this possibility slipped my mind because it also conflicts with the current grammar, in which m/s is considered to be a type marker on the number. However, this is easily solved by requiring numeric literals with unusual type markers to be printed in string form (`_m/s`"7").

Proposal A

  1. Backquoted type markers will no longer allowed be allowed on numbers, e.g. 7x remains legal but 7`x` will no longer be equivalent.
  2. Backquoted strings will have two meanings: (1) an identifier, (2) a suffix operator.
  3. This operator will have the same precedence as other suffix operators, and, similar to other suffix operators, a space before the backquoted string is not needed.
  4. When a backquoted string is used as a suffix operator, it will be prefixed by 'suf like all other suffix operators. Thus x`++` is equivalent to x++, which is equivalent to 'suf++(x).

The rationale for the last part is that LES should avoid prescribing specific features (like unit support) for programming languages and, since the syntax is operator-like, it makes sense to encode it as a suffix operator in the Loyc tree.

Discussion

I'm not entirely happy with proposal A by itself. The problem is that there is no elegantly-symmetric way to write prefix operators. LES3 doesn't support arbitrary prefix operators right now (it only supports punctuation-based operators). I'm thinking of allowing them in the form of a single-quoted name, e.g.

'sin x

but of course, this is not symmetrical - the syntax is different from suffix operators. On the other hand, this notation has the advantage that the name stored in the Loyc tree literally matches the code as written. So, while not perfect, the idea of having prefix and suffix operators that have a rather different syntax is not without advantages.

A more ambitious possibility is to include both 'apostropeOperators and `backquoted operators` in the language. In this case I'm inclined to treat backquoted suffix operators as having a different meaning than apostrophe-based suffix operators. For example, what if 7`m/s` was shorthand for 7 unit `m/s` (which in turn is shorthand for `'unit`(7, `m/s`))? Although I just said that LES should avoid prescribing specific features, I do believe that unit inference is so useful that it will be widely be supported in programming languages someday.

I like this idea, so along with this proposal I will make a second one:

Proposal B

  1. This includes proposal A except that the last rule of proposal A is deleted.
  2. Define a single-quoted operator format 'id where id is a normal identifier (not backquoted)
  3. 'id expr, where expr is a subexpression, is a call to 'id with one argument (expr)
  4. expr 'id, where expr is a subexpression, is a call to 'sufid with one argument (expr). Note: the example from above, speed 'm/s, means (speed 'm)/s in this proposal!
  5. These "apostrophe-based" prefix operators will have the same precedence as most other prefix operators such as -, ++ and *.
  6. These "apostrophe-based" suffix operators will have the same precedence as most other suffix operators such as ++ and --.
  7. Identifiers can end with an apostrophe, so when using an apostrophe-based suffix operator, a space is required between the number/identifier and the operator. (This statement is meant only as an observation, and there are exceptions, e.g. foo/*comment*/'op should also parse fine.)
  8. Numbers will not be allowed to end with an apostrophe, but apostrophe can still be used as a digit group separator, e.g. 262'144 is fine but 262'144' will be illegal.
  9. expr `unitId` will be shorthand for expr Unit `unitId`, except that the precedence will be different.

Why expr Unit `unitId`?

Rationale 1: I chose Unit instead of unit to change the precedence to be above +. This allows users to write an expression like flag && baseSpeed + dDist/dTime Unit `m/s` < max with a useful structure flag && baseSpeed + (dDist/dTime Unit `m/s`) < max. In contrast, (flag && baseSpeed + dDist/dTime) unit (`m/s` < max) would be useless. Rationale 2: A complex unit may not be defined explicitly (as an identifier) in a given language (e.g. perhaps m and s are defined but not m/s), so it is tempting to define expr `unitId` as expr Unit "unitId", which is probably a more appropriate definition in many languages. However it is important to make LES easy to learn, and it is easier to learn expr `unitId` ≡ `expr Unit `unitId` than expr `unitId` ≡ `expr Unit "unitId"`.

qwertie commented 4 years ago

I'm now thinking it would be best if the syntax of 'apostropheOperators was a sequence of IdStartChar instead of a normal identifier, i.e. digits and apostrophes would not be allowed in the operator name after the initial apostrophe. A couple of reasons for this restriction:

  1. Fewer spaces are required, e.g. 'sin'abs x would be okay and 'foo7 would mean 'foo 7.
  2. It becomes more practical to detect incorrect JavaScript-style use of single quotes, e.g. an IDE could detect that x = 'Hello' + x has incorrect syntax and that if it were changed to x = "Hello" + x it would have correct syntax.
qwertie commented 4 years ago

Also, rather than Unit, perhaps Type is a better name? A unit is a kind of type, but perhaps in some languages it might be attractive to express other kinds of types with the backquoted notation. Is also seems attractively generic... I imagine a language where you could specify 'dependent type' relationships:

.constraint positive: value > 0;
(x`positive`, y Is int >= 0);

I like this enough to use it in the initial commit today, but there's a problem: CodeSymbols.Is is already defined and represents lowercase 'is. So instead I'll use all-uppercase IS and define CodeSymbols.IS = (Symbol)"'IS".

Also, I forgot in Rationale 1 that Unit would be immiscible with + - * /. I guess I'll solve this by removing the immiscibility on uppercase word operators. Immiscibility is good to avoid confusion... right up until there's a reason to allow mixing.

Right now I have two uses of uppercase word operators in mind (Mod/Remainder and Unit/IS), and for both of them it is mostly OK to have the current precedence which is between * / and + -. It would be slightly better for Mod if its precedence was exactly equal to % but given that I have no idea what else people might use uppercase ops for, I'll just leave the precedence alone.

qwertie commented 4 years ago

Added backquoted and single-quoted unary operators for v28.0, where A`B` means A IS B.