tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.43k stars 93 forks source link

Let blocks #2010

Closed jneem closed 2 months ago

jneem commented 4 months ago

This implements let blocks, but only for "plain" bindings that don't do any destructuring.

I looked into handling destructuring. As long as we don't need to allow recursive bindings, I think destructuring can be handled by desugaring into a match block that matches an array of all the right hand sides. I.e.,

let
  a = 1,
  { foo } = { foo = 2 }
in ...

becomes

[1, { foo = 2 }] |> match { [a, { foo }] => ... }

Recursive let blocks seem to be rather more tricky with the current pattern compilation strategy.

jneem commented 3 months ago

We're trading the let-binding structure for something more complicated

Oh, good point. I think a Vec should be fine, and probably even a SmallVec or similar. I'll benchmark the options.

I think it isn't too difficult to support let blocks with patterns, so let me try to do that. If that works, the forbidden combination would be rec + pattern, which is a bit arbitrary but at least it's easy to explain. Ultimately, it would be nice to just not have any restrictions at all. I don't think it's hard to fix it, it will just take some work. The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

jneem commented 3 months ago

Ok, I've added support for patterns in let blocks.

I also measured the performance. It looks like the IndexMap hurt performance a bit, and changing it to SmallVec is as fast as the original. It's a little hard to be sure, since the benchmarks are pretty noisy.

yannham commented 3 months ago

So, the semantics of let pat1 = val1, ..., patn = valn in body is let [pat1, ..., patn] = [val1, ..., valn]. I think that's fair, although probably a bit wasteful as an actual desugaring. But it's ok for a new feature. It's also likely very easy to compile to a sequence of patterns directly in a more efficient manner without allocating, building and destructuring an array (I'm thinking implementing a variant of Compile for something wrapping [Pattern] and taking a sequence of arguments, or something like that).

The main issue I ran into is that pattern compilation for eval doesn't make it easy to figure out what the bound names are, and therefore we can't insert them into the recursive environment. But the logic for statically finding out the bound names is already in the typechecker, so it should be mostly a matter of reorganizing things...

I think pattern compilation is already doing something close to what you want. It basically compiles a pattern - given the term to match on - to an expression that evaluates to null or to a record of the bindings introduced by the pattern. The final step is to call %pattern_branch% or sth that evaluates the branch in an environment augmented with the bindings. We can change this last step to something more involved. But I fear this is going to be more complicated, because before we can bind any of the pattern variable in the environment, we first have to evaluate part of the matched value. I suspect we'll need something similar to what we do for recursive records, that is patching the environment afterwards to "tie the knot" properly.

In general, for recursive let-blocks with patterns, both the desugaring and the possible compilation aren't entirely crystal clear to me right now. Looking at a single recursive destructuring let could be interesting, something like let {x=a, y=b} = ...whatever uses a and b.... in body. Once again, before even being able to build the recursive environment for a and b, you first need to evaluate partially the body first. Maybe we can fill the environment with "bombs" initially, like InfinteRecursion errors (because obviously if we need a before we can even bind a, this isn't going to work). But then, we will bind a to a closure with those bombs in the environment, and we'll then need to patch them later with the actual recursive environment. As far as I can tell, this requires a bit more machinery that what's currently available. Or at least to repurpose part of the recursive records' machinery to that end.

jneem commented 3 months ago

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
  %0 = foo,
  a = %0.x,
  %1 = %0.y,
  c = %1.z
in
bar

I think this extends to handle arrays pretty easily. I'm less sure about constants and ADTs. Maybe those could be handled by extending the "primitive" (i.e. non-pattern) let term to allow constants or patterns of the form 'Foo x

yannham commented 3 months ago

What about let rec {x=a, y = {z = c}} = foo in bar becomes

let rec
 %0 = foo,
 a = %0.x,
 %1 = %0.y,
 c = %1.z
in
bar

So, if we have {x=[{y=[c]}]]}, we'll need to have c as a top-level binding I guess. It seems it's like pattern matching except that you bubble up all the bound variables as a top-level let-binding.

I think this illustrates my point: we now have to (re?)define the semantics of patterns for both pattern matching and destructuring separately, and also probably have separate implementations, as we used to, beyond the fact that I trust you can find a reasonable semantics for recursive destructuring. But as before, they might diverge or have different evaluation strategy, which can be surprising. It's not necessarily a no-go, but it's not very satisfying. I just wondered if could unify both in one core compilation scheme (even if there recursive side is actually never used in actual match expression but only in destructuring).

I think what the pattern matching compilation is doing is pretty close in spirit already. It just does that in pure Nickel code, by putting stuff in a record in several steps instead of bubbling up all the bindings at the top-level. One possibility is to make pattern compilation more like what you propose. Another is to do what I propose, and patch the result at the end. The issue with the latter is in case of infinite recursion (the current pattern matching compilation would cause an unbound variable error instead), but I think putting "bombs" in the environment and shadowing them at the end once we can build the recursive environment is actually maybe not that bad.

github-actions[bot] commented 3 months ago

🐰Bencher

ReportTue, September 10, 2024 at 14:44:20 UTC
Projectnickel
Branch2010/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

  • Latency (latency)

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencyLatency Results
nanoseconds (ns)
fibonacci 10➖ (view plot)498,870.00
pidigits 100➖ (view plot)3,224,700.00
product 30➖ (view plot)817,630.00
scalar 10➖ (view plot)1,497,000.00
sum 30➖ (view plot)803,420.00

Bencher - Continuous Benchmarking
View Public Perf Page
Docs | Repo | Chat | Help
jneem commented 3 months ago

I took a look at doing recursive pattern bindings by putting "bombs" in the environment, but I had trouble working out when to replace them. Like, in

let rec
  { foo = a, .. } = b,
  b = { foo = 1, c = a },
in b.c

then we want to start off by poisoning "a", then evaluating "b" to normal form, then we need to execute the pattern binding and unpoisoning "a", and then evaluating "c.a". But doing things in a different order could easily make us evaluate "a" too early, and I don't see a general way to do it...

Also, one thing I noticed about recursive pattern bindings is that it might want to be lazier than non-recursive bindings. For example, we currently eagerly evaluate enough of the bound value to check whether it matches the pattern but it isn't too hard to come up with examples of recursive bindings where this leads to infinite recursion that could be avoided.

yannham commented 3 months ago

Hmm. The more we talk about it, it the more it seems like recursive destructuring is just maybe something we should entirely avoid. Because if we have trouble defining the implementation and lazyness of this scheme, it's most likely neither obvious nor natural (in the mathematical sense of canonical) for the user as well.

jneem commented 3 months ago

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

let rec
  a = { foo = 3 }
  b = a.foo + 1
in b

but then if I want to refactor it to

let rec
  { foo } = { foo = 3 }
  b = foo + 1
in b

then I need to break up the let block into a sequence of lets. But I agree, if we don't have good semantics in general then it doesn't make sense to add it.

jneem commented 3 months ago

Ok, I tried writing tests and there are a bunch of problems with typechecking recursive let blocks. We'll need to do something more like what recursive records do, but I need to understand that first...

yannham commented 3 months ago

Yeah, that's fair. It does bother me a little that there are some easy cases that we don't cover. Like, I can write

One possibility would be to rule out more complicated cases, because it's not obvious that they are really useful in practice. So, for example, we could require that the right hand side of a destructuring let rec must evaluate to a WHNF without needing the recursive bindings. Then we could do the poisoning approach without caring about the order.

Otherwise, I reckon the right semantics for recursive destructuring is probably the old one where it's just desugared into a sequence of lazy let-bindings. However, I think there were non trivial issues around default values, lazyness (something like let 'Foo 5 = value in ... would be transparent because the pattern doesn't bind anything, while today it's desugared to value |> match { 'Foo 5 => ...} which does check that value is 'Foo 5), and error reporting. I just wonder if recursive destructuring is useful enough to 1) go back to this more complicated situation with discrepancies between pattern matching and destructuring, and more line of code in the implementation 2) breaking backward compatibility (again)

jneem commented 2 months ago

The good news is that the wasteful array desugaring is already gone in #2031. I got let blocks supported in tree-sitter-nickel a while back, so I can look at supporting that in topiary too