haskell-nix / hnix

A Haskell re-implementation of the Nix expression language
https://hackage.haskell.org/package/hnix
BSD 3-Clause "New" or "Revised" License
756 stars 115 forks source link

Resolve indentifiers scope upfront #1058

Open layus opened 2 years ago

layus commented 2 years ago

This is a POC that shows how we can resilve identifiers statically, on the expression tree.

A simple test shows that hnix execution speed increases from 14s to 10.5s when computing pow2 18 with the naive algorithm below.

pow2 = n:
  if n == 0 then 1
  else pow2 (n - 1) + pow2 (n - 1)

There are however some unresolved issues. As-is, this phase does not work because it would require accessing the builtins list (the base environment) during parsing. To alleviate that, the base environment has been turned into a weak scope (as if defined by a with). This is invalid from the point of view of nix semantics. There also appers to be a lot of places where we use environments in a way that makes this optimisation difficult. Ideally there should always be one base env, and then environments from the nix expression. We cannot change arbitrarily the level (offset) of the base environment or the trick does not work.

There exists ways to ensure, at type level, that offsets and scopes are consistent, but they involve tricky type machinery which dit not even attempt to setup.

Any thoughts ?

layus commented 2 years ago

Because each scope is a hash map, looking for a variable in a scope is efficient, but looking for the same variable in a set of scopes is not, as the variable needs to be hashed every time. This PR does not alleviate the linear search in the stack of scopes, but even nix does it that way, as it allows sharing the scopes across evaluation contexts. Going furhter, we could avoid any kind of hashing in most cases, if we could rely on the displacement (== index) of the variable in the scope at a given level. Then usgin Vectors for scopes would make more sense.

Of course, this PAR also changes the AST, which is bad. We may need a better way to encode AST rewrites before evaluation. Some more optimisations could be applied.