Open JohelEGP opened 6 months ago
Is my understanding correct that by loading you mean final user's program will load some library like DLL? If that's case, I'll raise concern about read-only programs, like microcontroller ROM-mable. Actually I like to use (near-)zero cost C++ for writing low-level stuff, but with this change there can be case, where you just don't have enough RAM to load extra executable. In that case cpp2 becoming language for big platforms only.
Is my understanding correct that by loading you mean final user's program will load some library like DLL? If that's case, I'll raise concern about read-only programs, like microcontroller ROM-mable. Actually I like to use (near-)zero cost C++ for writing low-level stuff, but with this change there can be case, where you just don't have enough RAM to load extra executable. In that case cpp2 becoming language for big platforms only.
The metaprogramming environment is just a compile-time thing, the very end result that you'd compile and run on your embedded system would be Cpp1 code, which at this stage would not involve any kind of runtime overhead (unless of course, generated by the metafunctions themselves).
In regards to name lookup in DLLs, the secret sauce is indeed to always use extern "C"
. However, that doesn't stop you from creating specific mangling on top of said C-naming and/or generating type-erased wrapper functions that then call the C++-mangled ones. For example, you could potentially solve the "namespacing" problem by following these steps:
extern "C"
outside of a any namespace (potentially its own cpp file / TO if they are "pure"?).dlopen
/GetProcAddress
.It is likely that all of this could even be done with an extra metafunction that registers somewhere outside and ensures the resulting metafunction is lowered with the correct cppfront-specific mangling (which would be then a implementation detail):
n1: namespace = {
n2: namespace = {
my_meta_function: @cpp2_metafunction (inout t: meta::type_declaration) = {
// ...
}
}
}
Could potentially generate code similar to this:
namespace n1 {
namespace n2 {
auto my_meta_function(meta::type_declaration& t) -> void {
// ...
}
}
}
extern "C" void __cppfront_n1_n2_my_meta_function(void* t) {
n1::n2::my_meta_function(*static_cast<meta::type_declaration*>(t));
}
// Outside code could then:
auto* my_func = dlopen("__cppfront_n1_n2_my_meta_function");
// After resolving the full name via the exposed "tree".
This of course would still be limiting, but I think that having something greppable that we could get rid of later is very helpful in any case (im talking about @cpp2_metafunction
).
I have thought about that for supporting an overload with an in
parameter.
My main concern is that name lookup couldn't behave as it does everywhere else.
Still, there is much value in this, even if initially we require to fully qualify all @
-uses of program-defined metafunctions.
Doesn't extern "C"
make it redundant putting the declaration in the global namespace?
Doesn't
extern "C"
make it redundant putting the declaration in the global namespace?
Oh yeah, seems you are right. I guess that would simplify generation in that case. From cppreference:
When a function or a variable is declared (in any namespace) with "C" language linkage, all declarations of functions (in any namespace) and all declarations of variables in global scope with the same unqualified name must refer to the same function or variable.
This implies what you stated.
The problem with having to perform name lookup in cppfront is shared with https://github.com/hsutter/cppfront/issues/666#issuecomment-1722329609.
There, a wrong answer results in a compile-time error (due to an incomplete type) or a missed optimization.
Here, choosing the wrong metafunction is a no-go.
We just can't know what names have been made visible to lookup in imported Cpp1 code.
The following uses of my_metafunction
had better always refer to the same overload set:
#include <a_cpp1_header.hpp>
my_namespace: namespace = {
my_nested_namespace: namespace = {
f: (t: cpp2::meta::type_declaration) = t.my_metafunction();
my_class: @my_metafunction type = { }
}
}
That said, it could be possible to require a metafunction to be
constexpr
and to actually evaluate it during constant evaluation to produce the updated type. The technique to implement that would me similar to the one presented in Interactive C++ in a Jupyter Notebook Using Modules for Incremental Compilation - Steven R. Brandt. But that is not this design (and I haven't explored such a design).
That would require parser.h
to be constexpr
-capable.
Running cpp2::meta::type_declaration::add_member
during constant evaluation would be slow.
Increased build times would be a let down, so I consider this path non-viable.
I would raise the question "Why do we need overload detection?"
The interface for calling a metafunction is already defined. It needs to have one argument which is of type meta::type_declaration
. Additional arguments are currently not supported. If we extend this, then the interface for all metafunctions would change.
So I currently do not see the use case for an overload detection in the library loader.
Comments to the document:
test::print
should be mangled as cpp2_metafunction_namespace_test_print
or something like it.meta <my_metalib>;
would force cppfront to load all metafunctions from the library my_metalib
. Afterwards, meta functions from that library could be applied. This has also the advantage that name clashes from metafunctions could be detected. If a library contains a metafunction with the name add_push
and a different library contains the same metafunction, then a nice error message can be generated. This concept would be quite similar to the modules in cpp1 that would make it simpler to integrate it into the existing build systems.We want to support both of these function types:
(inout _: cpp2::meta::type_declaration)
(_: cpp2::meta::type_declaration)
The one with the in
parameter is used by the built-in @print
.
It would also be used by metafunctions that don't modify the type, like
- a metafunction that generates compile-time file output (e.g., to generate code in another language, such as a Java/Swift wrapper for a C++ object by writing
my_interface: @java_interface type = { ... }
.-- https://github.com/hsutter/cppfront/discussions/650#discussioncomment-7170046
Ok, one could argue that the in
case is just a special version of the inout
case. Therefore, the inout
case would be enough.
If both need to be supported, then it would result in a fixed set of function definitions. These could be represented by an enum and the entry name of the enum could be included in the function mangling. E.g. cpp2_metafunction_in_namespace_test_print
. During the reading of the library the metainformation can be extracted from the name. It can then be checked when the metafunction is called.
One idea would be to include the declaration kind in the enum. This would allow better error messages. The enum fields could be:type_in, type_inout, function_in, function_inout, member_function_..., enum_..., namespace_...
Metafunction declaration could then include this:
print: @meta<metakind::function_in>
Maybe make it a flag enum and have the in
, out
as extra elements in the enum.
cpp2_metafunction_in_namespace_test_print
This is all we need.
When lowering the library symbol, include in
somewhere in it if it has an in
parameter.
When loading, if the symbol without in
isn't found, retry with in
.
This simulates overload resolution.
Fortunately, a metafunction is a non-templated function with a single argument of a fixed type.
The lack of name lookup is a big issue.
cpp2::meta::type_definition
.their_metafunction
refer to the same symbol:
our_metafunction: (inout t: cpp2::meta::type_definition) t.their_metafunction();
my_class: @our_metafunction type = { }
my_struct: @their_metafunction type = { }
These aren't my concerns:
(inout _: cpp2::meta::type_declaration)
.t.their_metafunction()
could call a different overload than @their_metafunction
:
their_metafunction: (inout _: cpp2::meta::type_declaration) = { }
my_namespace: namespace = {
their_metafunction: (inout _) = { }
our_metafunction: (inout t: cpp2::meta::type_definition) t.their_metafunction();
my_class: @their_metafunction type = { }
}
This is just bad code. Although it's not ideal that we don't detect this.
These cases, which surprise users, are my concerns:
(inout _: type_declaration)
,
where lookup for type_declaration
finds cpp2::meta::type_declaration
,
isn't recognized by cppfront as a type metafunction.@
-use, such as @their_metafunction
, not finding the name as it would anywhere else.
their_metafunction
could have been made visible via a using directive in #include
d Cpp1 code.
(Note that the @
-use could be in namespace ::ns1::ns2
,
the using directive be in ns1
in the Cpp1 header,
and the declaration it refers to be in ::ns3::ns4
).Right now, Cpp2 source files are processed in isolation. As I stated in the opening comment, we have to do dependency scanning. This is a good opportunity to output extra semantic information to pass down to later invocations. This would work pretty much work like how BMIs are passed down to later compilations of Cpp1.
For starters, it would serve us to keep a structure of introduced names for name lookup. This would mean that convenience Cpp1 using declarations and using directives would have to be moved to Cpp2 namespaces (in a new file or within the same one by also changing its extension).
This could also serve as a starting point for other features that aren't possible to implement right now.
In a better world,
all our dependencies would be C++ modules,
and we would have Cpp1 runtime reflection to query this information.
Then these requirements on Cpp1 using
could be limited to the current TU.
These cases, which surprise users, are my concerns:
- A type metafunction with function type
(inout _: type_declaration)
, where lookup fortype_declaration
findscpp2::meta::type_declaration
, isn't recognized by cppfront as a type metafunction.
I think this is a solvable problem today if I can loop over a DLL's symbols.
The were also concerns when cross-compiling with the Circle model of meta-programming.
Remember that in this design a metafunction is loaded from a library. A metafunction should be processed by a cross-compiled cppfront if it depends on platform specifics or build artifacts.
If cppfront can't be cross-compiled, then metafunctions can't be used. However, a metafunction that doesn't depend on platform specifics can be processed by cppfront on the host. But what about dependencies on build artifacts? If you don't mind duplicating work on the host, it's possible to use metafunctions.
The pattern that is common in many applications that load plugins as dynamic libraries is to have a common extern "C"
initialization function entry point. This function will take as a parameter an interface that allows the plugin to define its functions.
For example the interface might have an add_metafunction
method that defines the function name, the pointer to the implementation and the context where it would be used.
IIUC, that inverts the logic so that plugins register themselves, right?
IIUC, that inverts the logic so that plugins register themselves, right?
Yes, mostly. The application still needs to know that libraries to load but this process is just reduced to system calls to load the library and find one "C" function with a known name.
A note about this:
- kinds of security concerns that led SG7 to discard
std::embed
(P1040).
My understanding is that std::embed
is well on track for C++26 with paper P1967R12. It was design-approved for C++26 at the Feb 2023 meeting, and my understanding is that the only updates being requested are wording updates.
Design for program-defined metafunctions for cppfront
Introduction
This write-up presents a design to extend cppfront to evaluate program-defined metafunctions.
Conception
Support for metafunctions was first added by commit d8c1a50f22c1b171a50e87ccdb609fb05f41c021, "First checkin of partial meta function support, with
interface
meta type function". Its commit message also included the following sentence.After a lot of thinking, the idea of a "Cpp2 interpreter" seemed backwards to what cppfront is. Cppfront takes Cpp2 and lowers it to Cpp1, just like Cfront takes Cpp1 and lowers it to C. Interpreting Cpp2 could then be taken to mean one of two things:
constexpr
in C++11, and would probably evolve similarly.Interpretation 1 means changing what cppfront fundamentally is. Interpretation 2 feels unsatisfactory. It is very constrained and without the power of the whole language at your disposal.
I thus realized that there is an alternative to interpreted Cpp2. That alternative is loading a metafunction compiled in a library during the execution of
cppfront
. This model doesn't change what cppfront is. Additionally, a metafunction is normal Cpp2 code, just like the implementations of built-in metafunctions.Counterpoints
In this design, a metafunction is "normal Cpp2 code". In the Circle model of meta-programming, "normal Cpp1 code" can be executed at compile-time. This has raised concerns, quoted below, that are relevant to the present design. In our case, rather than compile-time, it's during metafunction evaluation.
Alternatives
Any alternative that requires recompiling
cppfront
or hard-coding metafunctions isn't viable at scale.I also considered whether we could use Cpp1's
constexpr
andconsteval
. These don't serve us if we are to use an existingcppfront
program. Consider the counterpoints. Given Cpp1'sif consteval
, aconstexpr
function can't be guaranteed to not use IO.That said, it could be possible to require a metafunction to be
constexpr
and to actually evaluate it during constant evaluation to produce the updated type. The technique to implement that would me similar to the one presented in Interactive C++ in a Jupyter Notebook Using Modules for Incremental Compilation - Steven R. Brandt. But that is not this design (and I haven't explored such a design).Counter-counterpoints
Maybe a metafunction can be required to be
@pure
(https://github.com/hsutter/cppfront/discussions/797#discussioncomment-7860363). Then, even thought a metafunction is still normal Cpp2 code, it isn't as problematic. Although@pure
still seems too restrictive.Design
This is based on what I learned from studying the documentation of Boost.DLL.
We need to emit a metafunction as an
extern "C"
symbol. The mangling of a Cpp1 symbol is experimental and not as portable (https://www.boost.org/doc/libs/master/doc/html/boost_dll/mangled_import.html). When loading the symbol of a metafunction, we need to use the same emitted name. This means that we need a protocol for the symbol name and to "C namespace" it.In its simplest form, we just need a function that, given the Cpp2 name of a metafunction (as
@
-used), it returns a function object that evaluates the metafunction.There is an implementation of this design at #907. Details on how this design was applied, as well as other implementations details, can be found there.
Evolution
Name lookup
Up until now, cppfront has been able to rely on the name lookup of lowered Cpp1 code. But this design introduces an evaluation point that happens outside the C++ abstract machine. It wants to look up a name that has already been compiled in Cpp1 and use it as named in Cpp2 code before the Cpp2 code has been lowered to Cpp1.
The current design doesn't consider name lookup. It expects a metafunction name to be
@
-used unqualified and to follow C "namespacing" conventions.Dependency scanning
The current design only requires specifying a protocol for lowering and loading a metafunction. To author and consume a metafunction at scale, we also need dependency scanning, pretty much like Cpp1 modules.
Many of us use a build system to manage the complexity of building Cpp1 code. We would like to avoid having
cppfront
run on a Cpp2 source that hasn't changed and if all of the libraries that provide the metafunctions it uses haven't changed. Conversely, we wantcppfront
to rerun if one of those libraries has changed.We can't know which metafunction a Cpp2 source uses without manually duplicating this information in the build system description.
cppfront
can't just emit the dependency information after the fact (like Cpp1 compilers on#include
d headers) because the libraries need to have been built before it starts evaluating the metafunction.It has been suggested that
cppfront
could have a command line argument for compiling a metafunction library. That would obviate the need for a dependency scanner, but this inversion of the build logic has drawbacks.There was an article that I can't find, I think linked from the LLVM Discourse, about how some other language's compiler (Go or Scala?) forked itself to build a module's sources in parallel. That ended up resulting in file system races in very rare cases. They rewrote their module compilation system to not fork itself and instead rely on their build system. That fixed the issues, and even (significantly? in some cases?) reduced compile times.
I think the general issue is attempting to do what should be done at a higher level. The higher level being that of the build system. The CMake support for Cpp1 modules already went in the direction of a dependency scanner (along with a long trail of papers for proper modules support). I think it'd be unwise to go in the other direction, which doesn't even seem to have build system support.