jzimmerman / langcc

langcc: A Next-Generation Compiler Compiler
https://langcc.io
Apache License 2.0
1.73k stars 58 forks source link

Cmake, no dependencies, windows support and more. #37

Closed danlapid closed 2 years ago

danlapid commented 2 years ago

Hey, I love this project and appreciate it deeply. I truly believe this project is a great way to build compiler frontends easily and extensively. However I believe it is crucial for this project to be easy to integrate into existing projects and use in any OS - both langcc and the created parsers. I've changed a few things:

Integration in other projects with Cmake:

include(FetchContent)

FetchContent_Declare(
  langcc
  GIT_REPOSITORY https://github.com/danlapid/langcc.git
  GIT_TAG        cmake
)
FetchContent_MakeAvailable(langcc)
add_custom_command(
    COMMAND $<TARGET_FILE:langcc> mylang.lang gen
    DEPENDS langcc mylang.lang
    OUTPUT gen/mylang__gen.cpp gen/mylang__gen.hpp
    COMMENT "Generating language code."
)
include_directories(
    PRIVATE   gen
    SYSTEM   ${langcc_SOURCE_DIR}/src
)
add_executable(MyCompiler main.cpp gen/mylang__gen.cpp)

No public api was broken and no changes were made to how to code actually operates. From all of my tests performance is improved for any given compiler, largely thanks to enabling IPO and replacing copies with const references. I realize this is a big diff and it may seem daunting, it started out as a personal project to improve the build system so I can use it and kinda grew from there. But I in my opinion this is a great change that will drive broader adoption.

A large part of the diff is formatting changes, I tried to find the most similar style to your current one to reduce the whitespace diff, if you could attach your .clang-format file I would gladly attach it to the repo and reformat all the files accordingly thus hopefully reducing the diff size.

jzimmerman commented 2 years ago

Hi -- Thanks for your interest in the project, and for this work! Unfortunately, there are a few problems, some procedural and some having to do with content.

First, I can't accept a PR of this size, especially not one that includes generated code. It would need to be split up into individual contributions for individual features, with each being easily able to be rebased onto main (no merge commits), and with any codegen commits easily reproducible.

Second, before you engage in the effort of splitting up the PR, you should be aware that I have a somewhat unusual style of writing C++ (and of organizing the project), which unfortunately would lead me to reject many of the proposed changes. There are some changes I would consider accepting; when in doubt, I would encourage you to propose each change in isolation before implementing it in a PR, as that may save a lot of effort in the future. Specifically, for the changes described herein:

Removed dependency on llvm and gperf, I appreciate the dump_stack feature but it doesn't do much on Release builds anyway as that should be the default for production usage, for development any debugger will break on exception which should be the best method for debugging, I believe the dependency on llvm and gperf will hurt adoption with little benefit.

Part of my approach in this project is to write in "C++ as it should be", not "C++ as it is". There is no good reason for C++ not to provide stack traces, even for release builds. This massively improves the experience of debugging crashes, especially in the wild. So I do not support disabling this functionality entirely. However, I would note that langcc does not actually have a hard dependency on llvm. If you build it and llvm is not detected, then it will still produce stack traces, albeit ones without source line information.

Also regarding tcmalloc I believe the performance is great even without it and most developers would rather not have it as a dependency, of course - thanks to Cmake if one chooses they want to build with tcmalloc they can easily do with with 1 Cmake config line.

tcmalloc makes a significant difference in the benchmarks as I measure them (and as they are reported in the accompanying publications), and it is important that the default configuration of the project reflect these figures.

Supports any compiler - tested on gcc, clang, msvc, clang-cl but any other compiler should be supported as the code was changed to be pure c++

I would have to look at the specific code changes to judge whether to accept this type of PR. However, one thing to note is that Windows and MSVC are definitely out of scope. It is simply too much upkeep burden to maintain Windows support indefinitely for a project at this stage.

Switched build system to Cmake - makefiles can still be generated from Cmake but the build system mostly used by c++ developers is Cmake and it supports cross-compilation much better than makefiles. Also as you can see below, it allows for seamless integration with other projects.

CMake is also out of scope (and has been addressed in previous GitHub issues). People have different opinions on build systems, but every build system comes with its caveats, and for langcc's purposes, ordinary Makefile is working just fine.

Created CI targets for Ubuntu,MacOS,Windows current and last versions to check for support

As discussed above, Windows is out of scope. I wouldn't be opposed to adding "last version" CI targets for Ubuntu and macOS, provided that this doesn't significantly increase the (sequential) time to pass CI.

Switch .sh scripts with .py scripts as it is the standard cross-platform scripting language

I don't see any particular reason to do this. Shell scripts are working fine. (And, as discussed above, Windows is out of scope.)

Switched testing system to gtest - it removes a lot of boilerplate code and allows for parallel testing easier and without cross platform problems, I also moved the tests to a different folder so it's easier see what is the actual source of the program.

I would have to evaluate this change in isolation, but would generally be inclined to reject it. It introduces another dependency, and moreover, the testing framework I have implemented works in a specific way that I expect, it's working fine as-is, and I am not sure that gtest would satisfy all of the desired properties.

Switched gnu attributes for c++ attributes to support any compiler

This is probably reasonable, although possibly unnecessary given that Windows is out of scope.

removed using namespace std - the using namespace std in langcc_util pollutes the namespace of any including application.

The "using namespace std" is guarded by namespace langcc, so it doesn't pollute the global namespace, only the langcc namespace. However, if you're willing to replace it by dozens of lines of "using std::X", "using std::Y", etc., that might be reasonable. My personal philosophical view is that the full standard library is ubiquitous enough that it should be bound in the global namespace, but I realize this is a minority view.

const reference - Added a ton of consts where relevant to improve const safety and a ton of references where relevant to reduce copying

I am generally avoiding broad use of many modern C++ features, such as const-correctness (because it makes code more burdensome to write) and move semantics (because it is error-prone, in addition to being burdensome). However, if you can point to specific places where introducing references significantly improves benchmark performance, I'd be open to this.

Switched path args to std::filesystem::path - again to improve cross platform support seeing as how the project requires c++17 and filesystem support anyways

I'd have to review this change in isolation; it might be reasonable.

Many fixes regarding clang-tidy errors which should reduce bug prone behavior

Again, would have to review each change in isolation.

Hopefully these comments make sense. I really appreciate your interest in the work, and your efforts in suggesting these changes. Unfortunately, langcc is an early project with limited manpower, and I need to confine the accepted PRs to those that are really necessary, and are streamlined for adoption.

Joe

danlapid commented 2 years ago

First I would like to note that I really appreciate your thorough review. I understand we have differing opinions on many of the above notes and I accept that.

I help maintain other widely used open source c++ code projects and above are a many things I have seen to be ubiquitous/standard among those. Of course this is your work and project and your prerogative to decline sweeping changes to it.

It is however disappointing seeing as how I truly believe these changes will be able to drive this incredible tool to a more production/commercial use case. Especially seeing as how the above is not suggestions but rather a completed working pull request.

If you believe at good majority of the above work might indeed be relevant I am not opposed to removing the changes to the generated code as requested leaving them to be regenerated by you (even though the ci generates all these files again and makes sure the generated code is equivalent to the committed code and I have no control over the ci machine, that is one of the additions I added to add security for this case exactly). As well as reformatting the code according to a clang format file so that the diff will be significantly smaller.

However for very specific parts of the above I don’t have much interest in splitting into many different RFCs and separate pull requests as I have already put much effort into this. Rather I would probably prefer to maintain my own fork with these changes as they are required for my relevant application of this work.

Thanks again for your time and consideration, I hope some of it will end up being useful to you.

jzimmerman commented 2 years ago

Thanks for your consideration -- hope langcc will be useful to you.