Various improvements need to be made to the --mlang-pipeline transformations. This issue lists known/important parts and serves as a point of reference for the discussion.
The split between language-composer.mc and symbolize.mc should be reconsidered
The split is convenient for implementation purposes, but language-composer.mc needs to do some name space handling, which seems like duplicated work. We would like to find a way to keep the convenience, but avoid the duplicated semantics. This should also look at some occurrences of String that should maybe be Name instead
Symbolization will currently put a new Symbol in the Name of a syn that extends another syn
Presumably this should just reuse the same Name.
Symbolize for language fragments will call updateEnv many times
This is a mapUnion, so likely a performance issue.
Translation from MLang to MExpr generates everything
It should be possible to only generate sems from fragments that are actually used, which might remove the need for dead code elimination, which is currently a thing in boot. This could potentially also detect when sems in a language fragment are fully equivalent with sems in another language (all branches are the same, and all called sems are the same), ideally even across languages that don't have an inheriting relationship.
The pipeline duplicates work
Right now the mlang pipeline calls the normal pipeline after translation to mexpr, which ends up re-running, e.g., symbolize.
Remove postprocess.mc
Due to a bug in utest-generate.mc, the mlang pipeline does a postprocessing step that renames the semantic functions to have names such as Ast_foo instead of just having the name being foo and distinguishing where between foos from different langauge fragments is done only through a Symbol. Once this bug is fixed, we can remove the postprocess.mc pass.
The composition check produces errors with little information
Various improvements need to be made to the
--mlang-pipeline
transformations. This issue lists known/important parts and serves as a point of reference for the discussion.Better Include Handling
The current include handling in the new mlang pipeline does not handle "lib:path/to/file.mc" includes and does not normalize paths fully for deduplicating includes. The semantics should be checked relative to boot. https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/utils.ml#L126-L167
Improve Performance for Pattern Analysis
Pattern analysis in the new pipeline recomputes pattern ordering, it doesn't remember results from previous fragments. Should take inspiration from boot implemenation. For reference: https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/mlang.ml#L248-L265 https://github.com/miking-lang/miking/blob/dcdbdab33a52312d9e0195dcad994b81c1e59854/src/boot/lib/mlang.ml#L190-L221
The split between
language-composer.mc
andsymbolize.mc
should be reconsideredThe split is convenient for implementation purposes, but
language-composer.mc
needs to do some name space handling, which seems like duplicated work. We would like to find a way to keep the convenience, but avoid the duplicated semantics. This should also look at some occurrences ofString
that should maybe beName
insteadSymbolization will currently put a new
Symbol
in theName
of asyn
that extends anothersyn
Presumably this should just reuse the same
Name
.Symbolize for language fragments will call
updateEnv
many timesThis is a
mapUnion
, so likely a performance issue.Translation from MLang to MExpr generates everything
It should be possible to only generate
sem
s from fragments that are actuallyuse
d, which might remove the need for dead code elimination, which is currently a thing inboot
. This could potentially also detect whensem
s in a language fragment are fully equivalent withsem
s in another language (all branches are the same, and all calledsem
s are the same), ideally even across languages that don't have an inheriting relationship.The pipeline duplicates work
Right now the mlang pipeline calls the normal pipeline after translation to mexpr, which ends up re-running, e.g.,
symbolize
.Remove
postprocess.mc
Due to a bug in
utest-generate.mc
, the mlang pipeline does a postprocessing step that renames the semantic functions to have names such asAst_foo
instead of just having the name beingfoo
and distinguishing where betweenfoo
s from different langauge fragments is done only through aSymbol
. Once this bug is fixed, we can remove thepostprocess.mc
pass.The composition check produces errors with little information
Many of the error values do not carry as much information as they could, e.g., which patterns are involved, which sems, which fragments they originate from, etc. https://github.com/marten-voorberg/miking/blob/fe5840ced7f8dd5c4c24530608136ead455f0388/stdlib/mlang/composition-check.mc#L102