miking-lang / miking

Miking - the meta viking: a meta-language system for creating embedded languages
Other
51 stars 31 forks source link

MLang Pipeline Improvements #849

Open marten-voorberg opened 6 months ago

marten-voorberg commented 6 months ago

Various improvements need to be made to the --mlang-pipeline transformations. This issue lists known/important parts and serves as a point of reference for the discussion.

Better Include Handling

The current include handling in the new mlang pipeline does not handle "lib:path/to/file.mc" includes and does not normalize paths fully for deduplicating includes. The semantics should be checked relative to boot. https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/utils.ml#L126-L167

Improve Performance for Pattern Analysis

Pattern analysis in the new pipeline recomputes pattern ordering, it doesn't remember results from previous fragments. Should take inspiration from boot implemenation. For reference: https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/mlang.ml#L248-L265 https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/mlang.ml#L190-L221

The split between language-composer.mc and symbolize.mc should be reconsidered

The split is convenient for implementation purposes, but language-composer.mc needs to do some name space handling, which seems like duplicated work. We would like to find a way to keep the convenience, but avoid the duplicated semantics. This should also look at some occurrences of String that should maybe be Name instead

Symbolization will currently put a new Symbol in the Name of a syn that extends another syn

Presumably this should just reuse the same Name.

Symbolize for language fragments will call updateEnv many times

This is a mapUnion, so likely a performance issue.

Translation from MLang to MExpr generates everything

It should be possible to only generate sems from fragments that are actually used, which might remove the need for dead code elimination, which is currently a thing in boot. This could potentially also detect when sems in a language fragment are fully equivalent with sems in another language (all branches are the same, and all called sems are the same), ideally even across languages that don't have an inheriting relationship.

The pipeline duplicates work

Right now the mlang pipeline calls the normal pipeline after translation to mexpr, which ends up re-running, e.g., symbolize.

Remove postprocess.mc

Due to a bug in utest-generate.mc, the mlang pipeline does a postprocessing step that renames the semantic functions to have names such as Ast_foo instead of just having the name being foo and distinguishing where between foos from different langauge fragments is done only through a Symbol. Once this bug is fixed, we can remove the postprocess.mc pass.

The composition check produces errors with little information

Many of the error values do not carry as much information as they could, e.g., which patterns are involved, which sems, which fragments they originate from, etc. https://github.com/marten-voorberg/miking/blob/fe5840ced7f8dd5c4c24530608136ead455f0388/stdlib/mlang/composition-check.mc#L102