tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.41k stars 92 forks source link

Recursive destructuring #2027

Open jneem opened 2 months ago

jneem commented 2 months ago

We've had some discussion of recursive destructuring in #2010 and #1989. Opening this issue for more discussion.

Nickel currently allows recursive bindings in let blocks (opt-in, using the rec keyword), but only if all bindings are simple. As soon as destructuring is used, recursive bindings are forbidden. Possible use-cases include recursive references for default arguments:

let rec { x, y ? x } = some_record in ...

or referring to other bindings in let blocks:

let rec
  x = f y,
  { a, b } = g x
in ...

The main difficulty with this feature is that we'd need to figure out the semantics of recursive destructuring in general, and it's possible to come up with pretty complicated examples.

Haskell

I spent a little while exploring how Haskell does recursive let bindings. Their let blocks are recursive and lazy by default, and it seems like they desugar the destructuring assignments into one gigantic letrec block. For example, running ghc -ddump-ds-preopt on an input like

  Bar (a, b) = x
  z@(c, [d, [e, Bar (m, _)]]) = g b c y
  y = i z m

gives (after a bit of clean-up, and removing all the error cases in the case blocks)

letrec {
  generatedPairVar = case x of { Bar x -> x; __DEFAULT -> fail }
  a = case generatedPairVar of { (x, y) -> x }
  b = case generatedPairVar of { (x, y) -> y }
} in
letrec {
  generatedTupVar = case g b c y of { (c, generatedListVar) ->
    case generatedListVar of {
      d : ds -> case ds of { e : es -> case es of { e' : rest -> case e' of { Bar (m, _) -> case rest of { [] -> (z, c, d, e, m) }}}}
    }
  }
  z = case generatedTupVar of { (z, c, d, e, m) -> z }
  c = case generatedTupVar of { (z, c, d, e, m) -> c }
  d = case generatedTupVar of { (z, c, d, e, m) -> d }
  e = case generatedTupVar of { (z, c, d, e, m) -> e }
  m = case generatedTupVar of { (z, c, d, e, m) -> m }
  y = i z m
} in
...

I'm not sure how ghc decides to use two letrec blocks instead of one; changing the input around to make it more recursive forces it use only one. I also noticed that ghc likes to produce one tuple binding for each pattern binding in the let block. So rather than recursively desugar z@(c, d) = ... like

letrec {
  z = ...
  c = case z of { (c, d) -> c }
  d = case z of { (c, d -> d }
}

it first generates a pattern than destructures all the bound variables and returns them as the tuple (z, c, d), and only then does it extract them. Since tuples are lazy, I think these two approaches are equivalent.

I'm going to have a look at scala next...

aspiwack commented 2 months ago

If you have questions about how GHC compile “binding groups” as they're called there, ask me again at the end of the week. I've spent a lot of time in this code this year (right now I'm drowning in emails, so not the best time).