nim-lang / RFCs

A repository for your Nim proposals.
137 stars 23 forks source link

support for infix operators without requiring unicode: quoted operators #390

Closed timotheecour closed 3 years ago

timotheecour commented 3 years ago

This is my counter proposal to https://github.com/nim-lang/RFCs/issues/388, which I find problematic.

proposal

Parser now accepts operators starting with a single quote, eg 'kron, called quoted operators.

motivation

example 1

taken from https://github.com/nim-lang/RFCs/issues/388

# current nim
assert hadamard(kronecker(A,B), kronecker(C,D)) == kronecker(hadamard(A,C), hadamard(B,D))

# unicode operator RFC
assert (A ⊗ B) ∘ (C ⊗ D) == (A ∘ C) ⊗ (B ∘ D)

# this RFC
assert (A 'kron B) 'hadamard (C 'kron D) == (A 'hadamard C) 'kron (B 'hadamard D)

example 2

taken from https://github.com/nim-lang/RFCs/issues/388 redefining and/or doesn't make sense (no pointless aliasing), however there are valid cases where redefining boolean logic makes sense, see https://github.com/nim-lang/Nim/pull/13541 which suggested and/or with short-circuit semantics in VM, allowing: when declared(Foo) and T is Foo: discard

# current nim
let truth = a and (b or c)

# unicode operator RFC
let truth = a ∧ (b ∨ c)

# this RFC
let truth = a 'and (b 'or c)
# so, if we were to revisit https://github.com/nim-lang/Nim/pull/13541, we could introduce 'vmand, 'vmor:
when declared(Foo) 'vmand (T is Foo): discard

example 3

bitand, bitor can now be defined as operators without confusion with the boolean logic

# current nim
let a = bitand(a, bitor(b, c))

# this RFC
let a = a 'bitand (b 'bitor c)

note that the existing precedence rules already require parens in many cases, eg the common gotcha:

# assert 0xFF'u8 and 0xFA'u8 == 250  # Error: type mismatch: got <uint8, bool>
assert (0xFF'u8 and 0xFA'u8) == 250

with this RFC you'd be able to write write:

assert (0xFF'u8 'bitand 0xFA'u8) == 250

which is clearer and avoids conflating logical and bit logic, which is error prone::

# this code deletes both files instead of just 1, because `execCmd` returns by `int status`
# instead of `bool` (success/fail), so bitwise semantics are used instead of logical semantics,
# and `and` `or` don't short-circuit in this case.
import osproc
echo execCmd("grep DELETEME foo.txt") and execCmd("rm foo.txt") or execCmd("rm backup.txt")

example 4

[EDIT for clarification] just like in regular nim which allows let ⊗λ = 123, this RFC doesn't prevent using unicode inside the operator, so you could still write this:

assert (A '⊗ B) '∘ (C '⊗ D) == (A '∘ C) '⊗ (B '∘ D)

but it wouldn't be required to use unicode as operators, more of a possibility (although i prefer not to use unicode for stated reasons)

precedence rules

The simplest is to give all quoted operators the same precedence, higher than ==, and require parens to disambiguate (and otherwise give CT error with a helpful msg). Exact precedence TBD, I can update this RFC to make it precise.

parser

there is no possible confusion, a quoted operator must follow these rules:

[space or newline] ['] [identifier] [space or newline]
discard 'a' # a char
discard '\a' # a char
discard '\x1' # a char
discard 1'u8 # a builtin literal
discard 1'big # a user defined literal
discard 1 'foo 2 # a quoted operator

alternative I had considered

I had also considered encoding precedence as follows:

discard 1 '*bitand 2 '+bitor 3
# parsed using precedence given by 1st symbol after quote, eg:
discard (1 '*bitand 2) '+bitor 3

but this is noisier than the simpler:

discard (1 'bitand 2) 'bitor 3

why not MCS a.kon b?

see https://github.com/nim-lang/RFCs/issues/390#issuecomment-867952508

haxscramper commented 3 years ago

It is in line with current nim design, does not force any decisions wrt. to "giving users more unreadable operators" vs "giving users too little operators" (most of the discussion in #388 revolves around this), nor any questions about possible ways of actually typing these symbols (large portion of the discussion on nim forum). Readability also improves, even for simple search on operators (as well as support for ancient terminals/fonts). Overall, I think this proposal is much better, since it addresses most of the concerns from #388.

I'm not sure if disallowing user-defined precedence is a good idea, maybe default to ==-precedence, and allow things like '*kron to work as well?

Vindaar commented 3 years ago

I don't see how this is a counter proposal to #388. It quite literally doesn't solve anything that #388 aims to do.

That said, I would like to have better disambiguation between bit operators and boolean operators. But that's something that doesn't need this proposal to be improved on.

I don't know why I would want to define my own operator like this over just a regular proc definition to be honest. There's too much visual noise going on that the added advantage over a.kron(b) isn't worth it to me.

Clyybber commented 3 years ago

While I don't personally need or neccessarily want unicode operators;

nor what would be precendence rules for those

388 is very hard to misinterpret regarding precedence, the unicode operators are literally listed under the ascii symbol that they should be considered equal to when determining precedence.

IMO (a 'kron b) isn't much better than (a.kron b) or a.kron(b), which is already possible.

HugoGranstrom commented 3 years ago

One thing this RFC misses that #388 does address (and it's one of the most important reasons it was created, to begin with) is that the code should mimic the equations you would write on paper. Even people not fluent in Nim would understand what a × b means (assuming they know the theory of course) while a 'cross b would be much harder to figure out what that syntax means.

timotheecour commented 3 years ago

IMO (a 'kron b) isn't much better than (a.kron b) or a.kron(b), which is already possible.

a.kron b runs into https://nim-lang.github.io/Nim/manual.html#templates-limitations-of-the-method-call-syntax which can prevent using it in templates, and causes ambiguities if there's a field of the same name, which is why in templates we usually have to forgo of MCS; quoted operators don't have this issue and the symmetry in arguments is cleaner for binary operators.

the code should mimic the equations you would write on paper Even people not fluent in Nim would understand what a × b means

note that in my terminal, axi and a×i are un-distinguishable: image

Besides that, sorry, but writing math isn't the same as writing code and nim is not perl or APL; I'd rather have explicit names than save a few keystrokes to be able to write the game of life via

life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

∙ ∘ × ★ and others is very much a convention and depending on the are of math/physics/engineering domain/language you work with, they can have different, incompatible meaning. There are many types of matrix products eg element wise multiplication, matrix multiplication, face splitting product, Khatri–Rao product, etc. Are nim users likely to know which one to use in each case? Or not clash in multiple libraries?

haxscramper commented 3 years ago

This solution would allow to define custom infix operators using regular identifiers. Nim identifiers can be Unicode. Therefore, with this proposal one could write as an operator. Which means it is not an either-or solution, but more generalized one that allows to switch between compact operators and more readable ones. Also a '× 10 could be distinguished from regular "x" because of the quote.

konsumlamm commented 3 years ago

While I like the general concept, I don't think using 'infix is a good idea, especially since that is already used for user defined numeric literals. Something like a `infix` b (which mimics the syntax that is used for defining builtin infix operators and is also used in Haskell) would be way better imo.

IMO (a 'kron b) isn't much better than (a.kron b) or a.kron(b), which is already possible.

That is also a good point, Nim's MCS makes this RFC kinda unnecessary.

timotheecour commented 3 years ago

This solution would allow to define custom infix operators using regular identifiers

indeed, I ididn't want to mention it to avoid confusion but I've added Example 4 to clarify.

I don't think using 'infix is a good idea, especially since that is already used for user defined numeric literals.

there is no ambiguity, as described in RFC; and it's also not visually confusing thanks to the requirement for spaces:

let a = 123'big 'bitand x

Nim's MCS makes this RFC kinda unnecessary.

I've explained why MCS isn't as good in above comment https://github.com/nim-lang/RFCs/issues/390#issuecomment-867952508

haxscramper commented 3 years ago

a `infix` b is already handled by nim parser as a(infix(b)) and this would be a breaking change (a small one, but still breaking). Quote is somewhat overloaded with meaning, that is true, but that behavior is quite similar to how a +b vs a + b is handled.

dumpTree:
  a `infix` b
  a +b
  a + n
StmtList
  Command
    Ident "a"
    Command
      AccQuoted
        Ident "infix"
      Ident "b"
  Command
    Ident "a"
    Prefix
      Ident "+"
      Ident "b"
  Infix
    Ident "+"
    Ident "a"
    Ident "n"
timotheecour commented 3 years ago

a infix b is already handled by nim parser as a(infix(b)) and this would be a breaking change

on that note, there seems to be an ambiguity with https://github.com/nim-lang/RFCs/issues/388 but that's a minor point:

proc ⊗(a: int): auto = a * 2
template fn(a = 7): untyped = a
echo fn ⊗ 3 # legal today, prints 6, parsed as (fn(⊗(3)))

still, https://github.com/nim-lang/RFCs/issues/388 should explain how to handle this case

HugoGranstrom commented 3 years ago

note that in my terminal, axi and a×i are un-distinguishable:

You could also write code in an environment that has decent fonts ;) Plus you wouldn't write it as a×i but rather a × i and in that context the only thing that fits is that the "x" is in fact an operator. (Or just badly written code but that could happen to your proposal as well).

Besides that, sorry, but writing math isn't the same as writing code and nim is not perl or APL; I'd rather have explicit names than save a few keystrokes to be able to write the gave of life via

There is no reason why we couldn't make coding in some senses more like math 😄 Plus no one is forcing you to write code using these operators, use your explicit function. Any library which only offers unicode operators will limit their audience so they have an incentive to offer both options. The main use of unicode operators I image isn't writing obscure programs in as few symbols as possible but rather for science-facing situations like presentations and code meant to be shared with other science-people, not poor ordinary Nim users afraid of being flooded with unicode operators.😉

Varriount commented 3 years ago

Aside from the syntax, is there any functional difference between the proposed syntax and backtick-quoted identifiers (a `infix` b)?

timotheecour commented 3 years ago

is there any functional difference

yes, see https://github.com/nim-lang/RFCs/issues/390#issuecomment-867961757

Varriount commented 3 years ago

Out of the two proposals (this one and #388), I prefer #388.

Here are my thoughts:

(as a side note, anyone know why GitHub expands the third mention of issue 388, but not the first or second?)

Araq commented 3 years ago

I'm fairly sure that the limitations regarding method-call syntax can be overcome by improving the compiler (@Araq feel free to correct me here).

You're right and the limitations need to be addressed regardless. (The situation is constantly improving too.)

Araq commented 3 years ago

While #388 adds a real value for some of our users and hopefully does not affect most of our users, #390 is just more syntactic sugar that everybody has to deal with. And we have plenty of syntactic sugar already.

Araq commented 3 years ago

I implemented #388 instead.

hamidb80 commented 10 months ago

I like the current state of infix operators but only wish nimpretty not convert this:

a .nor b

into this

a.nor b

this is the definition of nor

template nor(a, b): untyped =
  not (a or b)