teikalang / teika

MIT License
314 stars 6 forks source link

Explicit Substitutions in the context #165

Open EduardoRFS opened 11 months ago

EduardoRFS commented 11 months ago

I'm currently working on moving substitutions to the context, the main goal is compiler performance, but I also believe that it will lead to simpler code to maintain overtime.

Performance

It is not clear to me that this will improve performance in an abstract way, as it seems to be equivalent to just having it on the term tree.

But in practice performance is likely to get way better, allocating substitutions during propagation is definitely slow and will make the GC triggers way more often, even thought it will be mostly minor collections. This design will still allocate, but only when a substitution is introduced.

Lookup's, Stack and Heap

Bound variables use de-bruijin indices, so lookup will be always O(1) by using a mutable stack. Free variables use de-bruijin levels, so lookup will also be O(1) by using a heap, which is bounded by how nested is the deepest term generated.

My belief is that this will make it waaay more cache friendly for reasonable terms, while in theory it can be worse for really big terms with sparse access, most terms are reasonably small and even big terms such as the natural numbers will have more or less linear acccess.

This may be also way less efficient in memory in theory, in practice for the same reasons mentioned above, it will probably use less memory overall. In the future if the demand for really big terms increases, the typer may have 2 modes, where variables above certain threshold are indexed in different ways, maybe by propagating the substitutions.

Composition

As all the substitutions will be in the scope, substitution composition can be done in a way more natural way while hopefully preserving PSN and MC, due to explicit substitutions at a distance.

Lazy composition such as path compression can also be done if the context stores the order that each substitution was registered, as a substitution cannot escape it's scope.

Cleaning

Another performance feature of having it in the context is that by simply tagging which substitutions were used it, irrelevant substitutions can be ignored in the future. It is important to preserve all substitutions whenever a term has holes on it, to preserve all the important properties of explicit substitutions at a distance.

Higher Order Unification

Additionally this seems like it will make higher order unification way easier, as substitutions can be preserved in the context and then inserted back. This needs to be carefully done, but this design more or less should work.

EduardoRFS commented 10 months ago

Test example to be careful when putting the substitutions in the context.

\+3[close +3][open \+3]

A naive implementation will just loop forever. After using a substitution only lower ones can be used.

EduardoRFS commented 10 months ago

Couple insights, yes ES is about modeling and composing environments, but this cannot be done in a naive mutable way due to substitutions in substitutions, circles like the one mentioned above can happen if the substitutions are not used aka not linear, this is true even if only de-bruijin indices / levels are being used.

BUT, most substitutions happen on the first level, so a simple optimization is to make the first level a mutable stack and whenever a substitution in a substitution happens it splits in a readonly mutable stack + immutable stack. The same can also be done for the heap, even thought it is slightly harder overall to be done properly.

Locally nameless makes it especially easy here due to having only a single operation on unbounded variables(open), and as entering a binder introduces an open, it pushes everything further away, similar to a cons in ES.

EduardoRFS commented 10 months ago

As of #172 substitutions are limited and only accumulated on holes, this should change in the future, additionally I should study about substitution composition.

In a practical side, explicit substitutions at a distance seems like the way to go, but it is low priority right now.

EduardoRFS commented 6 months ago

I learned a lot since those, mostly the Linear Substitution Calculus seems to provide all the foundational theory here and the naive implementation, likely works.

The LSC currently lacks a canonical de-bruijin version, so the implementation in Teika will be lacking, but the idea still works.

The core insight is that the non-determinism from LSC makes sense, as it means LSC actually describe all the systems no matter the specific order that we choose to implement, making it waaay more robust.

https://mat.unb.br/~ayala/TCgroup/beniamino2012/rewriting.pdf https://hal.science/hal-01967532/document

Also related to elaboration, substitutions do not "describe the context", THEY ARE THE CONTEXT. Meaning that the following two rules gives some insight in the problem.

-------------------------------------------------
Γ |- M[x := N] : A[x := N] === Γ, x := N |- M : A

Γ |- M ⇒ L<(x : A) -> B>  Γ |- N ⇐ L<A>
--------------------------------------- // apply at a distance
Γ |- M N ⇒ L<B[x := N]>
EduardoRFS commented 1 month ago

This is strongly related and I will be writing more in the future.

https://github.com/teikalang/teika/issues/199