hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.56k stars 249 forks source link

Question: how to generate C++ headers from cppfront? #594

Open vladimir-kraus opened 1 year ago

vladimir-kraus commented 1 year ago

Please correct me if I am mistaken or let me know whether I am lagging behind the status-quo of cppfront design.

I believe that the success of any new "C++ successor" language will be determined not only how easy and safe it is to write new code but also by how easy it will be gradually translate existing code from C++ to cppfront. The simplest approach of transforming existing code would be to rewrite classes (typically with declaration in .h and implementation in .cpp) one by one from C++ to cppfront. This would exactly represent the way of language adoption as expressed by smooth ramp-up line as presented by Herb at the latest CPP conference.

I have read that cppfront is probably aiming to ditch the idea of C++ headers altogether and rely only on modules. I am no expert i modules (so maybe there is some magical solution...) but I think that ditching C++ headers completely may harm the process of cppfront gradual adoption. Simply because existing C++ code expects to include class headers. Majority of existing code is written without modules in mind.

So in my opinion, cppfront should have a way to generate headers alongside with .cpp files. I do not know how this can be done now. I learned that there i something like .h2 files. So I experimented with this a bit. I took a very naive (and wrong) approach...

// file widget.h2

Widget: type = {
    x : int = 0;
    y : int = 0;
    sum : (this) -> int;  // I naively attempted something like a method declaration here. It does not compile. I know it is wrong.
}
// file widget.cpp2

#include "widget.h2"

// I am naively trying something like a method definition. It does not compile. I know it is wrong.
sum : (this : Widget) -> int = {
    return x + y;
}
// file main.cpp2

#include "widget.h2"

main: (args) = {
    w := Widget();
    std::cout << w.sum();
}

I know this is wrong approach but given the documentation does not exist, I did not discover any other solution which would do the same and would work. I would expect that cppfront would generate widget.h, widget.cpp and main.cpp which would be exactly what a programmer would write in hand with C++, but with cppfront it will be much safer and more concise. The benefit would be that other existing C++ code could also include the generated widget.h if it needs it.

Another possible alternative to the .h2 files above would be writing just widget.cpp2 which would contain class declarations and implementations at one place but with some magical @... directive added to the class would cause that a .h file would be generated alongside with *.cpp. The header would contain class definition and method declarations, and cpp file would contain method implementations.

However as I wrote above, maybe some other and better solution already exists in cppfront and I am not aware of it. In that case I would love to learn about how to solve the issue above.

Addendum:

Myself being a Qt-framework fanboy I would love to see adoption of cppfront also within Qt community. By this I do not mean that Qt framework itself would be rewriten to cppfront, this will probably not happen. But I would love to see Qt applcaitions to be written in cppfront.

But the problem is that Qt has it's very special ways... It heavily relies on MOC compiler, which is basically a code generator which parses headers files and based on some macros such as Q_OBJECT it generates additional code necessary for the framework to work. In order to be able to allow interoparability between cppfront and Qt the following 3 steps would need to take place:

1) .cpp2 (and .h2) files are processed by cppfront. They generate .cpp and .h files. 2) MOC compiler processes all .h files in the project and where necessary it generates additional .cpp files with some Qt "magic". 3) All .cpp and .h files (i.e. all those generated by cppfront, generated by MOC and handwritten) are compiled and linked together.

So in order for this to work, it is essential in step 1 to be able to generate somehow the header files so that they can be processed by MOC in step 2...

JohelEGP commented 1 year ago

I have read that cppfront is probably aiming to ditch the idea of C++ headers altogether and rely only on modules.

I don't think so. From the README:

  • double down on modern C++ (e.g., make C++20 modules and C++23 import std; the default);
    sum : (this) -> int;  // I naively attempted something like a method declaration here. It does not > compile. I know it is wrong.

Cpp2 doesn't have separate declaration and definition.

MOC compiler

This is supposed to be taken care of metafunctions, or reflection in general.

vladimir-kraus commented 1 year ago

I reacted to this thread https://github.com/hsutter/cppfront/issues/120 where I read about not having headers in cppfront and relying on modules. Maybe I misunderstood.

However this is what concerns me a bit. I certainly understand the ultimate goal of having a new and much better language (and I believe that cppfront has such high potential). But I still think that practicality and easiness of gradual adoption is what can make or break the language. And by gradual adoption, I mean not only learning the language by programmers for using it in new projects, but also ability to gradually piece-by-piece rework existing large codebases and elevate them to higher ground. This is what I believe was expressed by the smooth ramp-up line in Herb's presentation.

Any codebase of any C++ project nowadays is basically a large heap of .h and .cpp files. Typically each class has one .h and one .cpp file. Gradual adoption and transition to cppfront in such a project would be to take one class at a time and rewrite it to cppfront. Without changing any files other than the header and the cpp related to this class. (AFAIK, Kotlin allows such simple transition from Java and it is exactly why Kotlin succeeded) If cppfront does not allow generating C++ headers I do not think whether this will be possible. Or is there any other, comparably easy way?

And even if generating C++ headers would not be the best-practice and the one recommended way of working with cppfront, I think they should exist as some backdoor possibility just for the existing code migration. And yes, as a side effect, they would allow win over the large Qt community too because they would allow using existing MOC tools.

JohelEGP commented 1 year ago

Commit 347c1c26f5651d7706001c18e2bf139f4e74b5fa came after that. So maybe the situation has changed since then.

vladimir-kraus commented 1 year ago

Oh, I see. Thank you for pointing me to this commit. Splitting to h and .hpp sounds like a viable solution to my concerns. But I am still doing something wrong here and it does not work for me.

// file widget.h2

Widget: type = {
    x : int = 0;
    y : int = 0;

    sum : (this) -> int = {
      return x + y;
    }
}

and I run ./cppfront widget.h2 (macOS 13.4, clang) and it does not create any widget.hpp as I would expect from the commit description. It creates only widget.h file which contains also implementations. So including it multiple times leads to multiple definition linking error. Below is the generated *.h file, please note the definitions at the end.


#ifndef WIDGET_H__CPP2
#define WIDGET_H__CPP2

//=== Cpp2 type declarations ====================================================

#include "cpp2util.h"

#line 1 "widget.h2"
class Widget;

//=== Cpp2 type definitions and function declarations ===========================

#line 1 "widget.h2"
class Widget {
    private: int x {0}; 
    private: int y {0}; 

    public: [[nodiscard]] auto sum() const -> int;

    public: Widget() = default;
    public: Widget(Widget const&) = delete; /* No 'that' constructor, suppress copy */
    public: auto operator=(Widget const&) -> void = delete;

#line 8 "widget.h2"
};

//=== Cpp2 function definitions =================================================

#line 5 "widget.h2"
    [[nodiscard]] auto Widget::sum() const -> int{
      return x + y; 
    }
#endif

So is this my mistake? Am I doing it wrong? Or is this a cppfront bug?

JohelEGP commented 1 year ago

I think it's just that a latter commit made it so there's only .h.

vladimir-kraus commented 1 year ago

I dived into cppfront source code... and I found that to produce hpp with definitions I need to use -pure-cpp2 switch. Well, this is probably a bit concerning again because such a *.h2 file cannot for example include other C++ (i.e. cpp1) headers... because it would not compile with this -pure-cpp2 flag. This makes the gradual transition very cumbersome because I could only start translating one by one those classes which do not depend on other C++ classes. This is rather limiting...

vladimir-kraus commented 1 year ago

I have an idea... Wouldn't it be possible to introduce a compiler switch (i.e. not default behavior, but opt-in) that would cause cppfront to work like this: when processing xyz.cpp2 file, it would split declarations and definitions. Declarations would go to xyz.h and definitions to xyz.cpp? Of course the xyz.cpp would include xyz.h at the very top. This way we could have proper C++ headers and including them to any other (cpp2 or cpp1) code would not break one-definition-rule.

But it seems to me as too simple to believe you have not already considered this. So I guess I have overlooked something important that makes this impossible...

JohelEGP commented 1 year ago

Ah, you're right about -clean-cpp1. If you don't use that flag, you have a single .h generated, which you can include from Cpp1 code.

vladimir-kraus commented 1 year ago

Well, yes and no.

Because it contains also definitions, you can include it only from one cpp1 file. Otherwise you will break one-definition-rule and the application will refuse to link. And that is a very hard limitation to usability of such a header.

JohelEGP commented 1 year ago

You're right. It seems that we're specifically violating

14 # For any definable item D with definitions in multiple translation units,

(14.1) if D is a non-inline non-templated function or variable, or (14.2) if the definitions in different translation units do not satisfy the following requirements,

the program is ill-formed; a diagnostic is required only if the definable item is attached to a named module and a prior definition is reachable at the point where a later definition occurs. Given such an item, for all definitions of D, or, if D is an unnamed enumeration, for all definitions of D that are reachable at any given program point, the following requirements shall be satisfied.

JohelEGP commented 1 year ago

See https://cpp2.godbolt.org/z/jeEobqWq9. It has this lib.h2:

#define GREET inline greet
lib: namespace = {
  GREET: () -> std::string_view = "Hello, World!\n";
}

And it works because the function in this header is inline. But if we remove the inline, this is the error:

other.cpp:(.text+0x0): multiple definition of `lib::greet()';

Could it be that functions in headers should be inline by default?

vladimir-kraus commented 1 year ago

But we do not want all functions in headers to be inlined, that would definitely harm application size and would cause other problems. I am afraid this is not the way to go...

I think there should be some other strategy for transition/migration of existing large C++ projects to cppfront.


Let me think out loud.

We will have three stages of migration existing C++ project:

Stage 1 - Pure cpp1 project. It is basically a pile of .cpp and h files, with declarations in headers and definitions in cpps. Cpps can #include the headers and headers also can #include other headers. During build, all cpps are compiled and then linked together.

Stage 2 - Mixed cpp1 and cpp2 project. Here the cpp2 (and h2) files are transpiled to cpp1 format. And then it is compiles and links the same as in stage 1. In order for the original cpp1 code to be able to use (#include and link) cpp2 code, we definitely need to be able to produce cpp1 headers from cpp2 files somehow. Modules probably cannot help us because most existing projects do not use modules, they still include traditional headers.

Stage 3 - Pure cpp2 project. Maybe not all projects will achieve this final stage. But successful migration is such that contains only absolute necessary minimum of cpp1 code with the rest being in cpp2.


Stage 2 is actually where the migration will happen. It has to be done gradually in very simple and small steps. At each step, the application will be fully buildable and functional. I believe that a strict requirement for this incremental migration is that you can change individually each single class in the project by rewriting its .h and .cpp to *.cpp2 and using it WITHOUT TOUCHING any other code (I mean other that the header and cpp where the class being migrated is defined and implemented). And this requirement "WITHOUT TOUCHING" strictly requires that we must be able to produce the header so that it can be #included from other files. And of course we need to be able to produce also a cpp file with the definitions. The definitions cannot be in the header because it would break one definition rule.

How to achieve this? Let's now forget about h2 files because I think they cannot help us here. They are probably only useful in the final stage 3, in pure cpp2 code.

So lets assume we have one class in one cpp2 file. We need to be able to generate the .cpp with function definitions and .h with declarations (side note: I am considering a class definition as a declaration here since ODR does not apply to it...). We basically have two tools:

a) cppfront compiler switches/flags b) some cpp2 in-code directives

Using these two we should be able to transform each .cpp2 file into one .h and one *.cpp, in a similar form to what a human programmer would write, only with cpp2 it will be much safer code.

The compiler switches should be able to instruct the compiler that it should produce the header and cpp. It should also be able to define how strict the compiler should be when checking the code in mixed cpp2 file. For example, when it encounters some non-conforming line, whether it should scream error or whether it should just ignore it and flush it to the generated .h or .cpp without any error because this may very well be a fully correct cpp1 code. For example Q_OBJECT macros in Qt classes. Currently cppfront screams error for such a line now. But I should be able to tell the compiler to just flush it to the cpp1 header without any questions. So there should be some cppfront switch (maybe called "liberal-mode" or something similar) which just passes any line which it does not recognize as valid cpp2 code directly to the output file.

As the migration would go on and there would me more cpp2 code and less cpp1 code in the mixed files, then these "liberal" switched could be gradually switched off to enforce stricter rules. But this should be on individual basis, file per file.

Unlike compiler switches/flags which operate on per-file basis, in-code directives can work on per-line or per-block basis and inform the cppfront compiler how to treat individual pieces of code. For example whether they should be output to .h or .cpp.


Alright, this was just thinking out loud. I do not have any concrete idea or design for these flags or directives in mind. I just believe that what I wrote is a necessary (not sufficient) condition for possible gradual migration of any cpp1 project.

SebastianTroy commented 1 year ago

I think the important thing to remember is that this migration will be with the full support of modules. Cppfront is not consumer ready yet, it might be around the time modules become more commonplace. I get the impression that the migration to modules could also be a migration to cpp2, as headers will be done away with at the same time anyway (as far as my limited understanding goes anyway)

On 17 August 2023 17:29:18 Vladimir Kraus @.***> wrote:

But we do not want all functions in headers to be inlined, that would definitely harm application size and would cause other problems. I am afraid this is not the way to go...

I think there should be some other strategy for transition/migration of existing large C++ projects to cppfront.


Let me think out loud.

We will have three stages of migration existing C++ project:

Stage 1 - Pure cpp1 project. It is basically a pile of .cpp and h files, with declarations in headers and definitions in cpps. Cpps can include the headers and headers also can include other headers. During build, all cpps are compiled and then linked together.

Stage 2 - Mixed cpp1 and cpp2 project. Here the cpp2 (and h2) files are transpiled to cpp1 format. And then it is compiles as in stage 1. In order the original cpp1 code to be able to use cpp2 code, we definitely need to be able to produce cpp1 headers from cpp2 files somehow. Modules probably cannot help us because most existing projects do not use modules, they still include tranditionals headers.

Stage 3 - Pure cpp2 project. Maybe not all projects will achieve this final stage. But successful migration is the one which contains only absolute necessary minimum of cpp1 code.


Stage 2 is actually where the migration will happen. It has to be done gradually in simple a small steps. At every step, the application will be fully buildable and functional. I believe that a strict requirement for this incremental migration is that you can change every single class in the project by rewriting its .h and .cpp to cpp2 and using the result WITHOUT touching any other code (other that the code in the header and cpp where the class is defined and implemented). And this requirement "WITHOUT TOUCHING" means that we must be able to produce the header so that it can be included from other files. And of course a cpp file with the definitions. The definitions cannot be in the header because it would break one definition rule.

How to achieve this? Let's now forget about h2 files because I think they cannot help us here. They are probably only useful in the final stage 3, in pure cpp2 code.

So lets assume we have one class in one cpp2 file. We need to be able to generate the .cpp with function definitions and .h with declarations (side note: I am considering a class definition as a declaration here since ODR does not apply to it...). We basically have two tools:

a) cppfront compiler switches/flags b) some cpp2 in-code directives

Using these two we should be able to transform each .cpp2 file into one .h and one *.cpp, in a similar form to what a human programmer would write, only with cpp2 it will be much safer code.

The compiler switches should be able to instruct the compiler that it should produce the header and cpp. It should also be able to define how strict the compiler should be when checking the code in mixed cpp2 file. For example, when it encounters some non-conforming line, whether it should scream error or whether it should just ignore it and flush it to the generated .h or .cpp without any error because this may very well be a fully correct cpp1 code. For example Q_OBJECT macros in Qt classes. Currently cppfront screams error for such a line now. But I should be able to tell the compiler to just flush it to the cpp1 header without any questions. So there should be some cppfront switch (maybe called "liberal-mode" or something similar) which just passes any line which it does not recognize as valid cpp2 code directly to the output file.

As the migration would go on and there would me more cpp2 code and less cpp1 code in the mixed files, then these "liberal" switched could be gradually switched off to enforce stricter rules. But this should be on individual basis, file per file.

Unlike compiler switches/flags which operate on per-file basis, in-code directives can work on per-line or per-block basis and inform the cppfront compiler how to treat individual pieces of code. For example whether they should be output to .h or .cpp.


Alright, this was just thinking out loud. I do not have any concrete idea or design for these flags or directives in mind. I just believe that what I wrote is a necessary (not sufficient) condition for possible gradual migration of any cpp1 project.

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/594#issuecomment-1682600682, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQK6O5X4I252GPS7UDDXVZBFXANCNFSM6AAAAAA3SNPYIQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

vladimir-kraus commented 1 year ago

@SebastianTroy I am no expert in modules, so allow me asking a question: is it possible to migrate a large codebase from header-based to module-based also gradually on one-file-at-a-time basis? Or does it require to migrate the whole codebase to modules as a prerequisity to migrating to cppfront?

SebastianTroy commented 1 year ago

Hard to be certain given that they aren't really implemented yet but certainly the intention is that #include is still supported alongside module, even within the same file

On 17 August 2023 17:42:29 Vladimir Kraus @.***> wrote:

@SebastianTroyhttps://github.com/SebastianTroy I am no expert in modules, so allow me asking a question: is it possible to migrate a large codebase from header-based to module-based also gradually on one-file-at-a-time basis? Or does it require to migrate the whole codebase to modules as a prerequisity to migrating to cppfront?

— Reply to this email directly, view it on GitHubhttps://github.com/hsutter/cppfront/issues/594#issuecomment-1682617628, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AALUZQLUS4FR3KR5ETNBOFLXVZCXHANCNFSM6AAAAAA3SNPYIQ. You are receiving this because you were mentioned.Message ID: @.***>

JohelEGP commented 1 year ago

I think there's a bug you can take advantage of. Commit 347c1c26f5651d7706001c18e2bf139f4e74b5fa says

  • consumption: any #include "something.h2" is emitted as an #include "something.h" in its original location relative to the Cpp1 source, and an #include "something.hpp" at the beginning of the Cpp2 definitions section

In -pure-cpp2 mode, the "#include "something.hpp"" isn't being emitted. So, right now, you can compile with Cppfront all your .cpp2 and .h2 source files, and in a single Cpp1 TU include all .hpp source files generated from .h2 source files.

So this works (https://cpp2.godbolt.org/z/o544qxfGj): lib.h2:

lib: namespace = {
greet: () -> std::string_view = "Hello, World!\n";
}

main.cpp2:

#include "lib.h2"
main: () -> int = { std::cout << lib::greet(); }

other.cpp:

#include "lib.h"
#include "lib.hpp"
cppfront -p lib.h2
cppfront -p main.cpp2
$CXX main.cpp other.cpp

Running the executable prints:

Hello, World!

Using -pure-cpp2 does mean that the Cpp2 source file can't consume Cpp1 libraries (other than the C++ standard's).

JohelEGP commented 1 year ago

I ran into this related issue while compiling cpp2util.h as a named module:

cpp2util.h:828:28: error: declaration of 'nonesuch' with internal linkage cannot be exported
  828 | constexpr static nonesuch_ nonesuch;
      |                            ^

I just made it inline instead of static. Same for https://github.com/hsutter/cppfront/blob/ecd37263f9b5a71f5beb18affc275d76ee537f9f/source/cppfront.cpp#L4981

Grinvase commented 1 year ago

@vladimir-kraus I ended up modifying cppfront.cpp to output everything before //=== Cpp2 function definitions in a new .h file and append #include at the top of the generated .cpp file. Example: https://github.com/hsutter/cppfront/commit/7975ec76fe89b15ecdb8c372050764f665c98208

Results: Given test.cpp2 (copied from https://github.com/hsutter/cppfront/blob/main/regression-tests/pure2-hello.cpp2):

decorate: (inout s: std::string) = {
    s = "[" + s + "]";
}

Using command cppfront test.cpp2 -split-header-file, produce test.h and test.cpp: test.h:

#ifndef TEST_CPP_CPP2
#define TEST_CPP_CPP2

//=== Cpp2 type declarations ====================================================

#include "cpp2util.h"

//=== Cpp2 type definitions and function declarations ===========================

auto decorate(std::string& s) -> void;

#endif

test.cpp:

#include "test.h"

//=== Cpp2 function definitions =================================================

auto decorate(std::string& s) -> void{
    s = "[" + s + "]";
}
JohelEGP commented 1 year ago

I found the solution. It's the opening comment's code, slightly modified.

widget.h2:

Widget: type = {
  x : int = 0;
  y : int = 0;
  sum : (this) -> int = {
    return x + y;
  }
}

widget.cpp2:

#include "widget.h2"
#include "widget.hpp"

main.cpp2:

#include "widget.h"

main: (args) = {
  w := Widget();
  std::cout << w.sum();
}

According to commit 347c1c26f5651d7706001c18e2bf139f4e74b5fa, a .h2 generates an .h with the declarations and .hpp with the definitions (now respectively Phase 1 "Cpp2 type declarations" and Phase 1 "Cpp2 type declarations", and Phase 2 "Cpp2 type definitions and function declarations"). (This actually only happens with -pure-cpp2, otherwise you get it all in the .h. It also says that the .hpp is #included in the definitions, but it seems that never happens).

If we compile widget.h2 with -pure-cpp2, it'll generate widget.h and widget.hpp. widget.cpp2 now just includes both generated headers to provide the implementation. In main.cpp2, we include widget.h to get only the interface, and link to the library with the implementation.

This is the CMake (https://cpp2.godbolt.org/z/TTo1qcq6c):

# See <https://github.com/hsutter/cppfront/issues/594>.
add_library(widget)
set(JEGP_CXX2_FLAGS "-p") # Pure to split implementatio to `.hpp`.
jegp_cpp2_target_sources(widget PRIVATE "widget.h2")
set(JEGP_CXX2_FLAGS "") # Non-pure to accept `#include widget.hpp`.
jegp_cpp2_target_sources(widget PRIVATE "widget.cpp2")

add_executable(main)
set(JEGP_CXX2_FLAGS "")
jegp_cpp2_target_sources(main PRIVATE "main.cpp2")
target_include_directories(main PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_link_libraries(main PRIVATE widget) # Link to `widget` for implementation.

I've been warming up with libraries for https://github.com/hsutter/cppfront/discussions/797#discussioncomment-7451749.

SavenkovIgor commented 8 months ago

Could anyone provide an update on the current status of this issue? Are there any developments or plans concerning the generation of headers and ensuring interoperability with tools that depend on C++ headers?

From what I understand, currently, it is not possible to have cppfront generate both header and source files without -p[ure-cpp2] mode and without resorting to external patches