Closed jonathanvdc closed 7 years ago
Before I read your message in its entirety, have you tried the MacroMode.PriorityOverride
option to override static_if
?
Whew! I just committed support for preserving newlines and comments in EC#, a feature that was much harder than I thought it would be, particularly on the printer side. And I fixed a bug in LLLPG that broke the parser and had me baffled for awhile. Hopefully trivia preservation will be much easier in LES, but sadly there are two different printers to deal with (I thought I could deal with all three printers and parsers in one day, hahahaha.. er.. nope, not by a long shot).
The problem I have with "modularizing" StdMacros is in deciding where to draw the lines. It is not obvious to me what counts as "high" or "low" level. I mean, for most of the macros in StdMacros you can point to one or more programming languages where the feature implemented by that macro is built into the language, and one or more other languages where it is an "add-on". But I guess I can agree with your examples: static_if
feels like a low-level feature while alt class
feels like, and operates as, an add-on, even though ADTs are built into many languages as a low-level feature.
Hopefully for now you can solve the problem with MacroMode.PriorityOverride
- but actually you should use PriorityInternalOverride
as PriorityOverride
is meant for end-users.
Doesn't implementing IsQuickBindLhs
correctly [in a macro] require an enhanced macro system like Nemerle has, with a second macro-processor pass that provides access to more type & member information? I haven't thought about that feature for a long time. (I'd love to have Nemerle-like macro hygiene, btw, but it involves concepts like "imports" and "colored identifiers" or "private symbols" or whatever the kids are calling them these days, and it seems to require closer integration with a compiler, which LeMP is not currently designed to provide.)
Say... is there a reason you're not using macros (quote {}
) to write RequiredMacros.cs?
Thanks for your replies! I'll try to respond to each of them in some arbitrary order below.
MacroMode.PriorityInternalOverride
existed until now. It seems like that enum
value could solve most (if not all) of my macro-related problems, so I'll definitely give it a try. IsQuickBindLhs
requires an enhanced macro system. All it takes is a few builtins, which are evaluated at compile-time. Admittedly, this does require compiler support for the builtins, but – strictly speaking – there's no need for an additional macro processing phase. Like foreach
statements, the ::
operator can be lowered to static if (#isQuickBindLhs(<lhs>) { <quickbind> } else { <scope resolution> }
. LeMP could implement the hypothetical #isQuickBindLhs
node as a macro that evaluates to a Boolean value, and process the static if
at macro expansion time. ecsc
can map both #isQuickBindLhs
and static if
to builtins, which are only evaluated during the IRGen phase. ecsc
defines builtins that can be used to build hygienic macros, and ecsc
's macro library actually uses those builtins to make sure that local variable definitions do not interfere with the outer scope's locals (and vice-versa). Again, these builtins are ecsc
-specific, but perhaps LeMP can use a few heuristics to obtain similar results.quote { }
in RequiredMacros.cs, but I sort of decided to hold off on that for now. Keeping the *.out.cs
files in sync with the *.ecs
files is a bit of a pain, so my plan is to just wait a little longer until ecsc
can compile its own macro library, and then start using quote { }
in the macro library.ecsc
generates – once the next version of Loyc lands, that is. I've always wanted to include comments from the source programming language into, for example, generated C++ code, and this is the missing piece I needed to do just that. Awesome!Update: I have defined overrides for the static_if
and @[#static] #if
macros in RedefinedMacros.cs, and the override mechanism seems to work beautifully. Thanks for helping me out!
(I'll close this issue now, given that my problem has been solved.)
I guess I have waited long enough to comment...
It's pretty interesting how you've stretched out the lexical macro system to execute what would otherwise be conventional compiler passes, but something about this approach makes me uncomfortable. Let's see...
#foo()
thinking that there is a macro called #foo
, often it is not guaranteed that the intended version of #foo
is actually called... #foo
's namespace might somehow not be imported at the call site, or maybe there's a name collision as there are two macros called #foo
).foreach
macro builds upon a series of compiler builtins. In this case it works, and potentially it could allows third parties to add their own new constructs with "smart" behavior - as you mentioned, #useSequenceExpressions
could use these builtins to act smarter. But I feel like there must be limits to how much one can accomplish this way. But even if those limits were overcome, by making your compile-time builtins turing-complete (and whatever else they would need), programming this way has a couple of major downsides:
So... I think ultimately we should move away from using lexical macros to do semantic tasks, though I'm not prepared at the moment to suggest what to do instead.
I'd like to share an idea I originally had about how a Loyc compiler would work. As you know, LNode.Name
is a Symbol
, and while currently Symbol
is basically just a string (it has an Id
too, which maybe I should delete), it is deliberately not sealed
, and in the back of my mind my plan was always to use Symbol
for resolved references in source code.
Imagine if your compiler built its symbol-tables as usual, but the symbols in the symbol tables were actually Symbol
s, i.e. derived from class Symbol
. Now, if a method contains this code:
var x = 23;
Foo(x);
The EC# parser, of course, produces a Loyc tree for Foo(x)
which refers to a pair of ordinary global symbols. My idea was that a "symbol resolution" pass would scan over the source code, replacing global symbols with resolved symbols. The output would still be a Loyc tree and it would still print out as Foo(x)
, but the symbol with Name = "x"
would actually "be" the local variable x
and include type information so that you could look up the type of x
directly from the LNode
, something like ((LocalVariable)x.Name).VarType
, if that makes sense. Similarly, the Symbol with Name = "Foo"
would actually represent the method Foo(int)
.
Hygiene could also be achieved by a macro simply producing Symbol
s that are in a "local" SymbolPool
(although I wonder if Symbol Pools are even a useful concept - non-global Symbols could be created without a pool, that's how it works in JavaScript). This doesn't work in the usual LeMP workflow (that I use), since EcsNodePrinter
just prints the Symbol.Name
without regard for whether it's a global symbol or not, so two different Symbols called x
end up with the same name in the output. But inside a compiler that problem need not occur, and when outputting plain C# I could fix the problem by adding a unique-naming pass, before printing, whose job would be to ensure that all non-global symbols get a unique name string.
So... does such a design sound like a good idea to you?
Looks like the discussion is sort of diverging into two separate subjects. I've split my response into two sections accordingly.
Truth be told, semantic macros as a replacement for a lexical macro-based design are a tough sell for me, so I'll argue against them now. (Sorry!)
Let's start off with a point of agreement. I think you're absolutely right that writing a #foreach
macro is pretty clunky. It's just that the alternative so far – baking it into the compiler – is so much worse. Perhaps it's worth explaining my two reasons for writing #foreach
as a macro:
foreach
is inherently platform-dependent. For example, .NET runtime has interface IEnumerable<T>
and interface IEnumerable<T>
, with foreach
being little more than syntactic sugar for a loop that calls methods on these interfaces. But ecsc
is not as biased toward the .NET runtime as csc
and mcs
, and can (at least theoretically) generate code for other platforms. By implementing foreach
as a macro, a different platform-specific definition can be written for foreach
. This wouldn't have been possible if I had baked foreach
in the ecsc
source code.foreach
as a lexical macro is, for all its clunkiness, still significantly less clunky than writing it as a "normal" expression/statement that the compiler can analyze. ecsc
's lock
implementation is significantly more complex than the foreach
macro, and that complexity buys ecsc
very little. Admittedly, error diagnostics for lock
are much clearer than the error diagnostics for the foreach
macro, but the foreach
macro can be platform-specific, whereas the lock
implementation is forever dependent on the existence of a type named System.Threading.Monitor
, regardless of the platform for which code is generated.These two points, especially the first, favor neither lexical nor semantic macros. Both can be used to create platform-specific definitions for (essentially platform-agnostic) constructs such as lock
, using
and foreach
. I don't think a semantic macro will be more concise than a lexical macro with builtins (especially if that lexical macro uses quote { }
), but I suppose that a semantic macro might be able to provide better diagnostics than the current lexical macros with builtins.
But I'm not convinced by your other criticisms. Specifically:
RequiredMacros.cs
defines all dependencies for the #foreach
macro in the same namespace, in the same file, in the same assembly. It's all pretty cohesive. There's just no way that #foreach
will get imported without its dependencies.
And I don't think it's reasonable to expect #foreach
(or any other macro, for that matter) to continue to work fine when someone writes a conflicting definition (with an equal priority) for its dependencies. If that happens, then I expect LeMP to either report an error, or just silently pick one of those macro definitions and have ecsc
diagnose any errors that this may cause. Besides, the macros in LeMP.StdMacros
are just as vulnerable to this sort of thing as #foreach
. I could easily redefine if
as a macro that always picks the then-branch, and I'm sure that'd break no small amount of macros in LeMP.StdMacros
. Symbol
doesn't seem like a bad design, but that IR would still need to be constructed and then lowered to Flame's IR (ecsc
translates Loyc trees directly into Flame IR at the moment). The construction/deconstruction passes would quite possibly impact performance far more negatively than a handful of builtin nodes ever could.Though my main argument against semantic macros is their complexity. They add yet another major pass to the compilation process. That means that anybody who seeks to fully understand how EC# works must also learn how semantic macros work, which constructs they can affect, the order in which they are expanded, what the structure and rules of the IR they operate on are, etc.
Also, semantic macros complicate macro development. Right now, an EC# macro is always a lexical macro. But if semantic macros are introduced as well, then programmers will suddenly have to ask themselves which type of macro they should create. If semantic macros are strictly more powerful than lexical macros, then they might have to re-write entire lexical macros as semantic macros when they realize they need a feature that only semantic macros offer.
Plus, semantic macros are arguably harder to verify by the compiler than lexical macros. Right now, the (Flame) IR generated by ecsc
is correct by construction. If the input LNode
s contain errors that must be diagnosed at compile-time, then ecsc
will do just that, and will subsequently terminate the compilation process; ecsc
either reports an error or produces a correct IR tree. But a semantic macro could sneakily insert a semantically invalid construct, without the compiler taking notice. This would result in very hard-to-detect bugs.
As a final point in favor of lexical macros with builtins: this type of thing has precedent in other languages. The D programming language, for example, defines std.traits
, a module that defines function templates which answer questions about the source code. That is eerily similar to ecsc
's builtins. For example, isArray
is the D equivalent of #builtin_is_array_type
. If I recall correctly, then C++ has similar metaprogramming libraries.
Going forward, I think that it would be better to wrap ecsc
's builtins into a standardized macro library, just like D has done. (Apparently, Phobos uses dmd
builtins under the hood, as well: isNested
seems like a fair example).
I very much like the idea of using separate SymbolPool
s to guarantee that symbols don't overlap. It certainly is a lot cleaner than the builtin-based design. I don't have a lot of spare time right now, but I think it's definitely worth implementing. I'll try to implement it in ecsc
and its macro library once I get the opportunity to do so.
Or do you think that it makes more sense to just use LeMP's (future) renaming pass in ecsc
? That would have the extra advantage of producing less confusing output when ecsc
is instructed to print macro-expanded source code (via the -E
switch).
Let me start here because I suspect you misunderstand what I was suggesting:
If the input LNodes contain errors that must be diagnosed at compile-time, then ecsc will do just that, and will subsequently terminate the compilation process; ecsc either reports an error or produces a correct IR tree. But a semantic macro could sneakily insert a semantically invalid construct, without the compiler taking notice. This would result in very hard-to-detect bugs.
I'm suggesting the same kind of semantic macros as Nemerle has. The macros would run after the list of types and methods has been built, but before method bodies have been converted to IR. You can't sneak anything invalid in if macros run before semantic analysis. (I think in Nemerle you might be able to do semantic analysis on local variables too, and create new class members.... I don't know how that works and I haven't easily found details online; if we do this we should install Nemerle and do some experiments to understand the details.)
And the second-stage macros should operate on Loyc trees so that users can easily switch which kind of macro they are writing, and so that the same MacroProcessor can still be used.
As a final point in favor of lexical macros with builtins: this type of thing has precedent in other languages. The D programming language, for example, defines std.traits, a module that defines function templates which answer questions about the source code.
Isn't that quite different? I didn't try any serious metaprogramming in D but IIRC, you can write code that calls those trait (builtin) functions directly and immediately acts on the results. That's more powerful and general than generating a syntax tree that eventually calls trait functions, as a lexical macro must do.
They add yet another major pass to the compilation process.
Since some of the compiler's own work could be implemented with them, it's not necessarily an additional pass beyond what you'd do anyway, is it? How many passes do you use already?
On chaining macros. RequiredMacros.cs defines all dependencies for the #foreach macro in the same namespace, in the same file, in the same assembly. It's all pretty cohesive. There's just no way that #foreach will get imported without its dependencies
I'll tell you a secret... I originally planned to let people write fully qualified macro names like Namespace.MacroName(...)
, I don't remember actually implementing that, but I'm seeing some code to support it in MacroProcessorTask.GetApplicableMacros
so, yeah, maybe you can already invoke a macro without its namespace being imported.
Or do you think that it makes more sense to just use LeMP's (future) renaming pass in ecsc?
I'm confused because this is not an either/or question. If multiple symbol pools are used then a renaming pass is required to avoid name collisions in the text output.
Another thought, perhaps one should be able to write quote(pool) {...}
to use a specified symbol pool for all identifiers in a quotation.
One more thought. For hygienic namespace lookup, to accomplish the same thing that Nemerle does, macros could instruct LeMP to look up macros from specific namespaces by wrapping their output in a certain command, say, #macroNamespaceContext()
.
As an example, let's say macro1
is in NamespaceA
and it wants to return a call to macro2
in NamespaceB
- without using a fully-qualified name - and there is also a macro2
in NamespaceC
. macro1
is called from user code like this:
#importMacros(NamespaceA);
#importMacros(NamespaceC);
class Foo { ... }
macro1(otherMacro(Foo));
The user is also trying to call otherMacro
from NamespaceC
.
Let's say the macro originally returns something nonhygienic like this:
return quote { macro2($(node[0])) };
That is, macro1
is a useless macro that forwards the call to macro2
. But the point is it wants macro2
from NamespaceB
, while otherMacro
should still be resolved from NamespaceC
. The returned syntax tree in this case is
macro2(otherMacro(Foo));
Whereas the hygienic return would look more like this:
SymbolPool pool = new SymbolPool();
return F.Call("#macroNamespaceContext",
F.Literal(pool),
F.Literal(new DList<Symbol> {(Symbol)"NamespaceC"}),
F.Call(pool["macro2"], node[0]));
so the first argument is a Literal containing a SymbolPool
, the second argument is a Literal containing an IReadOnlyCollection<Symbol>
of namespaces*, and the third argument is the actual output from the macro. Finally, the macro processor would use each Symbol's SymbolPool
to decide in which list of namespaces to search for that Symbol.
(* the macro processor currently doesn't support qualified names stored as LNode - the qualified name Foo.Bar
is just a Symbol with a literal dot in its name - and maybe it should stay that way since Symbols are dramatically faster than LNode as a dictionary key, or for comparisons.)
Obtaining hygiene is a bit clunky this way, but a helper function and/or macro could help.
My reply turned out to be significantly longer than I hoped it would be. Sorry about that.
The macros would run after the list of types and methods has been built, but before method bodies have been converted to IR. You can't sneak anything invalid in if macros run before semantic analysis.
Right, but isn't that a little contradictory? I mean, suppose that a type Foo
has been analyzed, and we now encounter an expression Foo.Bar
. Foo
in Foo.Bar
could a type, a namespace or a value, and there's really only one way to ascertain what Foo
is in the context wherein it appears: by analyzing the expression Foo.Bar
. And semantic macros need to be able to answer questions like: "is Foo
a type?" You probably also want semantic macros to be able to discover what Foo
's type is, if it is a value rather than a type, which requires for all preceding statements and expressions to have been analyzed.
So, really, semantic macros would have to be evaluated in the middle of the semantic analysis process.
Which brings up the problem of what exactly a semantic macro returns. For example, a #foreach
macro has to know what the type of its collection expression is, so it'll want to evaluate its collection argument node first, check what that collection's type is, build a loop with one or more induction variables, and only then evaluate the loop body node in the context of that loop.
At the moment, that logic is captured by the #foreach
lexical macro as a Loyc tree, and the compiler then analyzes the resulting tree.
Now suppose that a semantic macro has to do the same job. There are basically two options here:
Alternatively, the semantic macro could produce some kind of wishy-washy Loyc tree that contains both unanalyzed Loyc trees and IR trees. But that's just option one in disguise, as the macro could easily insert invalid IR trees in that mixed Loyc/IR tree.
I also don't feel comfortable with imposing Flame IR on the macro writer, because that implies that any breaking change to Flame is a breaking change to EC#. But if Flame IR is not used in semantic macros, then they'd require (at the very minimum) an entire type system to insulate the semantic macros from Flame's type system. That'd be a lot of work, would definitely hurt performance, make it harder for (other) people to write an EC# compiler, and would add little value to the language.
Anyway, I just like the (apparent) simplicity of lexical macros. They're fairly easy to write and understand, and I don't feel the same way about semantic macros. Maybe I'm wrong in thinking that, but the only way to know for sure what EC# semantic macros would be like is to build a prototype, and then take things from there. That'd be a massive undertaking, though, and I'd rather just keep on extending ecsc
up to the point where it starts becoming a reasonable alternative to csc
and mcs
for some projects. Can we put semantic macros on hold until we get to that point? There's still a lot of work to be done before ecsc
is mature enough to compile itself.
Isn't that quite different? I didn't try any serious metaprogramming in D but IIRC, you can write code that calls those trait (builtin) functions directly and immediately acts on the results. That's more powerful and general than generating a syntax tree that eventually calls trait functions, as a lexical macro must do.
I don't completely understand what you mean by that. Can you give me an example of "code that calls those trait (builtin) functions directly and immediately acts on the results?"
Since some of the compiler's own work could be implemented with them, it's not necessarily an additional pass beyond what you'd do anyway, is it? How many passes do you use already?
I was referring to the passes specified by the language itself: preprocessing, lexing, parsing, lexical macro expansion and semantic analysis. In hindsight, I was wrong to say that semantic macros necessitate another pass. They probably just extend the semantic analysis pass, but in an awkward way.
I'll tell you a secret... I originally planned to let people write fully qualified macro names like Namespace.MacroName(...), I don't remember actually implementing that, but I'm seeing some code to support it in MacroProcessorTask.GetApplicableMacros so, yeah, maybe you can already invoke a macro without its namespace being imported.
Well, I sure didn't see that one coming. I'm glad you're considering hygienic macro imports, though. I wouldn't mind switching to those once they become available.
I'm confused because this is not an either/or question. If multiple symbol pools are used then a renaming pass is required to avoid name collisions in the text output.
What I meant was that storing Symbol
instances in the ecsc
's symbol would solve a problem that has already been solved if LeMP is going to rename all locals anyway.
Another thought, perhaps one should be able to write quote(pool) {...} to use a specified symbol pool for all identifiers in a quotation.
Sounds like a great idea.
Obtaining hygiene is a bit clunky this way, but a helper function and/or macro could help.
Yeah, that syntax is sort of clunky. But I suppose that's okay if we can find an appropriate macro to build on top of that. I think a variant of quote (pool) { ... }
that implicitly creates a pool, and then imports a set of namespaces could work nicely. Here's an example of what I imagine what that might look like (I know #quoteWithMacroImports
is a bit silly; I haven't put much thought into what to call this macro).
return #quoteWithMacroImports (NamespaceC, OtherMacros)
{
macro2($(node[0]))
};
Hi there. As you may have inferred from the title of this issue, I would like to modularize the standard macro library. I'll start off with some background information on how
ecsc
handles things, and then try to justify why I think the standard macro library should be split up. So please bear with me.Background information
As
ecsc
has evolved, it has increasingly come to rely on a combination of macros and magic node types to lower existing C# constructs to lower-level EC# constructs. For example, this innocent-lookingforeach
loopgets macro-expanded to
The lowered code is admittedly pretty cryptic, but it actually Does The Right Thing™. And you don't even have to take my word for it; if you stare at the expanded version long enough, then you'll see that first, a
static if
(#builtin_static_if
) determines ifcol
is a one-dimensional array (#builtin_static_is_array
,#builtin_decltype
). If this happens to be the case, then a simplefor
loop is used to iterate overcol
's values. Otherwise, it will useMoveNext
/Current
to iterate overcol
.The calls to
#builtin_stash_locals
and#builtin_restore_locals
are used to implement hygienic macros.#builtin_stash_locals
hides local variable names – if the specified names are already in use by local variables, then they are removed from the local scope, and pushed onto a stack – which allows the inner expression to re-use those names safely.#builtin_restore_locals
performs the opposite operations: it discards any locals that intersect with the names it specifies, and restores the old variables that were mapped to these names (if any).A remarkable property of
#builtin_static_if
,#builtin_static_is_array
,#builtin_decltype
,#builtin_stash_locals
and#builtin_restore_locals
is that, fromecsc
's perspective, these node types aren't special at all. Like any other node type, they simply get analyzed during the IRGen/semantic analysis phase, and that's all there is to it.Problem
One thing that annoys me about this, though, is that – by introducing
#builtin_static_if
– bothstatic_if
and#builtin_static_if
are now legal, and, more importantly, they behave differently.static_if
is defined as a macro inLeMP.StdMacros
, and it is far less powerful than#builtin_static_if
because the latter construct can take full advantage of the fact that it is only evaluated during the IRGen/semantic analysis phase.Had I used
static_if
in the macro expansion above, then LeMP would have told me that:"'static if' is incredibly limited right now. Currently it only supports a literal boolean or (x 'tree==' y)"
. That's unfortunate, especially since#builtin_static_if
doesn't share those limitations.Ideally, I'd like to "fix" this confusing situation by defining
static_if
as a macro that expands to#builtin_static_if
. Unfortunately, I can't do that right now, because the standard macro library is structured as a single, monolithic binary; I can't just go ahead and definestatic_if
inecsc
's macro library, as that re-definition would then conflict with the existingstatic_if
definition fromLeMP.StdMacros
.Proposal
So, here's my proposal: to divide the standard library into (roughly)
alt class
, andstatic_if
.Separate implementations of the low-level commands can then be created for LeMP and
ecsc
. The former would probably just be a re-packaging of the current macro implementations inLeMP.StdMacros
, while the latter would expand nodes such asstatic_if
toecsc
builtins.The higher-level standard macro library can then depend on the lower-level macros without concerning itself with the specific implementation of the primitive operations.
The move to a low-level/high-level split in the standard macros can be gradual. Initially, we could just try moving
static_if
and#useSequenceExpressions
into a low-level "primops" library. Later on, it might prove useful to implement certain operations as heuristics in the LeMP "primops" and as exact builtins in theecsc
"primops." When I say heuristics, I'm mostly referring to things like this.LeMP can't do much better than this, but
ecsc
can tell with absolute certainty if the left-hand side is a value, and I think that it's a shame that we're not taking advantage of that information when it's readily available.Anyway, no part of this proposal is missing-critical, but I do think that it'd be pretty neat if standard macros were able to take advantage of
ecsc
builtins under the hood. Thanks for considering my proposal.