unisonweb / unison

A friendly programming language from the future
https://unison-lang.org
Other
5.75k stars 269 forks source link

Add precedence rules to Unison #3468

Open runarorama opened 2 years ago

runarorama commented 2 years ago

The fact that all binary operators associate to the left is a source of surprise for programmers (and probably bugs).

We should do something about this. Options include:

  1. Hard-code some precedence rules based on the symbols in the name of the operator (like Scala does).
  2. Implement user-definable precedence rules (like Haskell) does.

The latter is of course much more robust, but the former is by far the easier to implement.

ceedubs commented 2 years ago

One thing that's super cool about Unison is that we could implement 1 for now and swap it out for 2 in the future. This is purely a syntactic construct only relevant to the parser and pretty-printer; the underlying AST is what will be stored so we don't have to worry about changing the meaning of programs.

Option 1 could potentially lead to people choosing bizarre names just to get the precedence that they want. But I think that it's actually a little less overwhelming to both library authors and consumers. Library authors aren't faced with an upfront "hmm what's the right precedence value?". And library consumers probably get a feel for the precedence rules that's consistent across libraries (even if they don't consciously realize).

My vote would probably be to go for 1 for now and reconsider 2 in the future if 1 feels like it's lacking.

ceedubs commented 2 years ago

here @atacratic brings up an interesting point about the potential downside of the pretty-printer having expert-level knowledge of the precedence rules. A simple option would be for the pretty-printer to always add parentheses for binary ops; but I don't know this might look really bad in practice. One of these days structural viewing/editing might help :)

ceedubs commented 3 months ago

Alternative proposal: just create special rules for +, -, *, and / to avoid issues with basic arithmetic (which newcomers often hit early on and makes Unison look silly). And come up with a fancier solution later if we feel the need.

pchiusano commented 3 months ago

Another option:

  1. It's a parse error to mix operators in the same expression without parens. So x + y * z requires parens. x + y + z or foo |> blah |> blah2 are fine and parse left associative.

Basically, you don't need there to be a total order on all operators.

This could be combined with option 1, so the standard arithmetic and boolean operators could have expected precedence and could be mixed, but mixing other operations requires parens.

Discord thread

sellout commented 2 months ago

I definitely think Unison shouldn’t do the same as Haskell does – a fixed set of levels, with operators hardcoded at various points. I’ve personally felt a lot of pain with it.

Another way to approach it – relative precedence (this is not my idea, but I can’t recall where I’ve come across it). You can define the relationship between any two operators:

(^) `hasHigherPrecedenceThan` (*)
(*) `hasHigherPrecedenceThan` (+)
(-) `hasSamePrecedenceAs` (+)
(.) `hasHigherPrecedenceThan` (<=<)
(||) `hasSamePrecedenceAs` (&&)
(==) `hasHigherPrecedenceThan` (&&)
(!=) `hasHigherPrecedenceThan` (&&)
(+) `hasHigherPrecedenceThan` (==)
(+) `hasHigherPrecedenceThan` (!=)

which forms a partial order

graph RL;
    A[+ / -]--> B[*];
    B --> C[^];
    D[<=<] --> E[.];
    F["|| / &&"] --> G[==];
    F --> H[!=];
    G --> A;
    H --> A;

This implies that x + y - z is fine, but x == y != z needs parens. A library that defines a new operator can do

(><) `hasHigherPrecedenceThan` (+)
(*) `hasHigherPrecedenceThan` (><)

resulting in it being injected at the right place without invalidating any other precedences

graph RL;
    A[+ / -]--> B[><];
    B--> C[*];

Any precedence rule that introduces a cycle given the rules that already exist in its dependencies is an error (this could cause an issue when you pull in two dependencies that have conflicting rules).

sellout commented 2 months ago

Another option:

3. It's a parse error to mix operators in the same expression without parens. So `x + y * z` requires parens. `x + y + z` or `foo |> blah |> blah2` are fine and parse left associative.

I think there are some interesting options in this vein.

[^1]: I’ve done something like this before by using parse forests – if the typechecker failed, we tried other possible parses before giving up. It was neither performant nor comprehensible to users (but maybe some edit.debug command that made all grouping explicit – i.e., moar parens)

[^2]: Although a linter might tell you to use (x * y) ↑ z instead 😁

sellout commented 2 months ago

Any precedence rule that introduces a cycle given the rules that already exist in its dependencies is an error (this could cause an issue when you pull in two dependencies that have conflicting rules).

Alternatively, cycles could just invalidate the precedences in the cycle, resulting in a warning and disjoint precedence (requiring parens) between those operators.

Disjoint precedence could also be made first class, so that

(==) `hasHigherPrecedenceThan` (&&)
(!=) `hasHigherPrecedenceThan` (&&)
(+) `hasHigherPrecedenceThan` (==)
(+) `hasHigherPrecedenceThan` (!=)

from the earlier example could instead be

(==) `hasDisjointPrecedenceFrom` (!=)
(==) `hasHigherPrecedenceThan` (&&)
(+) `hasHigherPrecedenceThan` (==)

so that == and != are at the same level, but can’t be mixed without parens.

kylegoetz commented 2 months ago

I'm assuming (almost) all the confusion arises out of simple arithmetic operators. 1.0 + 2.0 / 3.0 being evaluated from L to R rather than according to PEMDAS.

So if anything, it seems like there's a lot more value to implementing #1 where the arithmetic operators (plus ^ if it's aliased to Math.pow and mayyyyybe) than doing #2. How often would someone be confused that a ^-!&&->~ on the left is not evaluated before ++--!+ on the right because L to R evaluation seemed wrong?

alvaroc1 commented 2 months ago

My preference: hard-coded PEMDAS (not quite what scala does regarding prefix/suffix... just a fixed list of symbols), everything else is left to right

This allows correct math rules, which everyone knows and is used to, and no surprises when reading unfamiliar code.

sellout commented 2 months ago

To mitigate my previous comments a bit – I’m in favor of the “hard-code PEMDAS, leave the remaining behavior as-is” approach. Would definitely be an immediate improvement.