sharkdp / numbat

A statically typed programming language for scientific computations with first class support for physical dimensions and units
https://numbat.dev
Apache License 2.0
1.16k stars 50 forks source link

The `|>` operator can cause parsing errors where normal function calls do not #549

Open Goju-Ryu opened 1 month ago

Goju-Ryu commented 1 month ago

I have continued working on the list representation of mixed units and came upon this odd behavior. In my function, inside an if expression, calling a function with two arguments worked fine, but piping in one of the gave an error. This seems unintentional since I see no reason why it would make it ambiguous.

Below is the two function definitions and their output.

# works as expected
>>> fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> =
    if len(units) == 1 then
        reverse(cons(val -> head(units), acc))
    else
        unit_list_impl(val - unit_val, tail(units), cons(unit_val, acc))
    where unit_val: A =
        if (len(units) > 0)
        then trunc_in(head(units), val)
        else error("Units list cannot be empty")

  fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> = if (
len(units) == 1) then reverse(cons(val ➞ head(units), acc)) else unit_list_impl(val
 - unit_val, tail(units), cons(unit_val, acc))
    where unit_val: A = if (len(units) > 0) then trunc_in(head(units), val) else er
ror("Units list cannot be empty")

# fails due to parseing errors
>>> fn unit_list_impl<A: Dim>(val: A, units: List<A>, acc: List<A>) -> List<A> =
    if len(units) == 1 then
        reverse(cons(val -> head(units), acc))
    else
        unit_list_impl(val - unit_val, tail(units), cons(unit_val, acc))
    where unit_val: A =
        if (len(units) > 0)
        then val |> trunc_in(head(units))
        else error("Units list cannot be empty")
error: while parsing
  ┌─ <input:74>:6:5
  │
6 │     where unit_val: A =
  │     ^^^^^ Expected local variable definition after where/and

error: while parsing
  ┌─ <input:74>:9:9
  │
9 │         else error("Units list cannot be empty")
  │         ^^^^ Expected one of: number, identifier, parenthesized expression, str
uct instantiation, list

I tested it in the online terminal (wasm) from firefox.

sharkdp commented 1 month ago

A smaller example with a similar problem:

>>> if true then 1 |> id else 0
error: while parsing
  ┌─ <input:11>:1:16
  │
1 │ if true then 1 |> id else 0
  │                ^^ Expected 'else' in if-then-else condition

The problem is that |> has lower precedence than if … then … else (https://numbat.dev/doc/operations.html). It can be fixed using parens:

>>> if true then (1 |> id) else 0

But maybe precedence should be changed to allow for this? I guess if we would do that, it would prevent us from doing things like

if … then … else … |> function

and we would have to put the conditional into parens:

(if … then … else …) |> function

which defeats the initial purpose of this operator (that you can always just add it at the end of a line to do an additional function call)

sharkdp commented 1 month ago

Unrelated: I'm excited about this unit_list feature!

Goju-Ryu commented 1 month ago

The problem is that |> has lower precedence than if … then … else

Oh of course, I completely forgot about that. I'm not sure of which is better, but perhaps a terminating token to the if-else expression would solve some of the problem? My thinking is that the then clause is already delimited by the else, making it a logical unit to scope as if in parentheses. If the else clause could be delimited as well it would also have an easy to parse scope. It does bring other challenges but it would disambiguate a pipe after the whole statement as if it's after the delimiter it would apply to the whole if-else expression while if inside it applies to only that scope. My main argument against this approach would probably be changing if-else expressions would become ugly or reintroduce ambiguity. For good measure here is an example syntax for the idea:

if true 
then 1.5m |> floor_in(m) 
else 1.5m |> ceil_in(m)
end

# equivalent to
if (true)
then ( 1.5m |> floor_in(m) )
else ( 1.5m |> ceil_in(m) )
end

# the problem with chaining:
if i < 0
then "negative"
else 
  if i > 0
  then "positive"
  else "zero"
  end
end
Goju-Ryu commented 1 month ago

Unrelated: I'm excited about this unit_list feature!

I really liked the idea of it when I read about it, so I'm excited to try my hand at it!

sharkdp commented 1 month ago

I'm not sure of which is better, but perhaps a terminating token to the if-else expression would solve some of the problem? My thinking is that the then clause is already delimited by the else, making it a logical unit to scope as if in parentheses. If the else clause could be delimited as well it would also have an easy to parse scope. It does bring other challenges but it would disambiguate a pipe after the whole statement as if it's after the delimiter it would apply to the whole if-else expression while if inside it applies to only that scope.

I think that reasoning is correct.

if-else expressions would become ugly

Indeed :-/

# the problem with chaining:

what do you mean by that?

Goju-Ryu commented 4 weeks ago

# the problem with chaining:

what do you mean by that?

I meant that making a chain of if-else expression would no longer be possible without nesting them.

I've had an idea about this. Could the syntax of the if-else be extended? To my understanding the current syntax is something like the following.

IfElse = "if" Cond "then" Expr "else" Expr 

What if chaining was an actual part of the syntax and all chains had to end with a terminating token. Something like the following.

IfElse = "if" Cond "then" Expr "else" Expr  ; Regular if-then-else
IfElseChain = 1*IfElse "end" ; A chain of one or more if-then-else's terminated with a single "end" token

I'm thinking this allows for an optional end token for the normal case that could be used to make the parsing of the pipe operator unambiguous and making the nesting of if-then-else chains easier to determine. As far as I can see it would allow for the current syntax in all cases but enable additional uses without the need for parentheses or many ugly terminating tokens.

Disclaimer: I do not really understand the current implementation of the expressions. I have never used Rust and can't really gauge how big of an undertaking this would be, so if it is a larger change then I don't mind the parentheses that much, but would probably write a paragraph about it in the documentation.