zerodaycode / Zork

Project manager and builder automation tool for modern C++ projects
MIT License
149 stars 10 forks source link

Automatically resolve module dependencies #76

Open foip opened 1 year ago

foip commented 1 year ago

Feature Request: Automatically resolve module dependencies

This approach would be a drastic change to the direction of this project. But I think it would be a nice approach, so I want to discuss it before trying to implement it.

What

At the moment the user has to declare all source modules and their dependencies inside zork.toml. It would be nice if Zork could take care of that.

Why

All the alternative C++ build systems use some sort of scripting language for build configuration. Zork uses a more declarative way with TOML.

The problem with that is, that we can never match the configurability of a full blown scripting language. And I don't think we have to. Cargo for example also doesn' t allow full configuration for all source files, it allows just enough. This is a tradeoff between configurability and usability, and usability is what makes cargo so great. I think the usability route would be the right way for Zork.

Having to explicitly and correctly specify all source files and dependencies puts the burden onto the user. The user also has to change zork.toml potentially every time he makes code changes or refactorings.

How could it be implemented

We would have to scan the source files for import statements after Loading the configuration file. Then we could fill the dependencies properties in the project model.

We could put the dependencies in a subdirectory of the cache, so we only have to scan the source file, if the code changed. It could look something like that:

|-out
  |-dependencies
    | - math.deps
    | - math2.deps
    | - ...

Where a deps file could just be a plain text list of the dependencies

// math2.deps
math

Tradeoffs

As mentioned above, the tradeoff is having to scan each changed source file once per build. But I would argue that having Zork scan the source files and determining the dependencies is still way faster than the user having to change the configuration manually (and less error prone for the user).

Taking it a step further

This might be taking it a bit too far, but here goes.

We could also take the full Cargo route, where the user only has to specify the entry point of the application (or entry points of the library). We simply scan the "entry" source file and only compile dependencies if needed.

But there is only one way I could think of to make this work. We would have to mandate that the source file location of modules correspond to the module name. For example the package com.github.zork would need to have the path src/com/github/zork.cppm or src/com/github/zork/mod.cppm or something like that. Module partitions could live in that directory too:

|-src
  |-com
    |-github
      |-zork
        |-mod.cppm
        |-some_partition.cppm
        |-some_implementation.cpp

If some other module depends on this module we could simply compile the entire directory, where mod.cppm is the main module file, other cppm files are module partitions and .cpp files are implementation files. Of course you could make the extensions configurable.

You could again argue that this would take a lot of configurability from the user, but if we look around in ecosystems of other languages as well, we find that classical C++ build systems are the only ones where you have no standardised source directory structure.

Most of the configuration inside zork.toml is done once and then never changed (like name or authors). So the source file configuration is the largest burden on the user. With these changes that burden would disappear almost completely.

The cache strategy from above would also work with these more drastic changes, if we are concerned with performance.

Again, this is a very radical direction so I would like to hear your opinion.

TheRustifyer commented 1 year ago

I really like what I am reading, but I need some time to figure out the impact that those changes will have in the user interaction with the API.

Let me a couple of days to think about the tradeoffs, and the future and direction of the project. I will come with an answer for this proposal

TheRustifyer commented 1 year ago

I didn't forget about this, neither I'll do. But I still would need more time to figure out the right direction.

Even tho, I like how the proposed changes looks like until the Taking it a step further point. Forcing the user to match structures of packages sounds like a Java thing, and really C++ classical people (or even more "newers" like me) don't really like too much.

Also, scaning the dependencies won't be ineficcient, but most of the hard work is already done by the compilers with the implicit module lookup, take for example, the approach of MSVC.

Also, GCC automatically tracks every module dependency by itself, but there's the problem of the gcm.cache folder path.... Potentially, Clang users would be the most beneficiated, but I am not sure if it's worth the effort. Partitions are other thing... and module implementations also. Ideally, we could let the user have the same features until now (declare deps explicitly, or let Zork++ figure out as in your proposal)

foip commented 1 year ago

Forcing the user to match structures of packages sounds like a Java thing

Most of the build systems for various languages I've worked with behave this way. Most of which are admittedly for JVM Languages (Kotlin, Scala, Clojure, Groovy). C# namespaces are also usually reflected in their source path. Even Rust modules follow this approach.

So for me the C++ ecosystem has always been the outlier that doesn't follow a similar approach (consistently).

I stand with my point, that a low configuration approach provides a lot of usability and - in my opinion - is something that would set apart Zork from other C++ build systems. But we might agree to disagree on that point.

GCC automatically tracks every module dependency by itself

I'm not sure how sufficient the GCC and MSVC approaches are, because we still need to build the modules in the right order so the dependent modules can pick them up. Correct me if I'm wrong, but at the moment we don't consider that in our build process.

Scanning the project structure from a single entry point would allow us to build a dependency tree as we go, which would also solve that.

TheRustifyer commented 1 year ago

Most of the build systems for various languages I've worked with behave this way. Most of which are admittedly for JVM Languages (Kotlin, Scala, Clojure, Groovy). C# namespaces are also usually reflected in their source path. Even Rust modules follow this approach.

Sure, you get a point on this. Even tho that the Rust approach is slightly different, more like the Python one (init.py and mod.rs), but I see this hard to translatate into the C++ ecosystem. Not for the technical implementation, but for the root of the foundations. Typically, anything that isn't part of the standard isn't well received in the community, and I hope that you already know how the C++ community is...

Correct me if I'm wrong, but at the moment we don't consider that in our build process

No, you're totally right. Build order depends on user declaration. You can take a look at other side project that we have in ZDC, ZERO that hope one day could be usable as an example of usage of Zork++. This project is more like a toy project, but maybe one day could get a good way (or not, I am not worried about), but still could be used as a bigger example (or more like a real-world project) for using and exemplify Zork++ As you may notice in my fork, in that branch, things are really getting out of control quickly, and here is where your idea become more important or relevant.

Scanning the project structure from a single entry point would allow us to build a dependency tree as we go, which would also solve that.

Sure. But we may take an intermediate path. We could pre-parse everything first (I mean, while the project_model is built), and then figure out the dependency tree by ourselves by just looking at the export module, export import, import and module statements. This shouldn't complicate much the things in terms of our effort, and it won't be the need to force the user to match a folder structure, (that I strongly believe that doesn't fit in the C++ dinosaur culture). Typically, scanning the files for those declarations would be fast (since Rust is really fast, and we just will be rescaning dependencies only if the translation unit is modified), so I am really not worried about performance, and we could get the "better" of all the ideas.

TheRustifyer commented 1 month ago

Now that the project is more mature, I feel that is a good time to face this issue.

The plan

  1. When the project_model is built, scan every translation unit, up until the point where the first export (block of C++ code appears). Note that the standard dictates that all the export, export-import, import and module declarations must appear within the module purview, so we just need to identify the case where the first export isn't the export module primary interface declaration, then the other exports should be the public C++ API interface of such module.
  2. Fill the dependencies attribute with all the dependencies of the target module kind. That's should be all the import statements made in the module purview
  3. Whenever our translation unit analyzer determines that we need to rebuild a translation unit, rebuild as well ALL the dependencies. That will allows us to finally address the issue with Clang and their module-cache, which most of the times makes the build compilation fails due to cache "miss-match", making the user to clear our user cache and rebuild everything from scratch, which is extremely tedious and makes the user lose a lot of time
  4. In line with what @foip proposed, we may force the user to declare the entry point of the Zork job. This can be simply one of the cfg attributes that we already have, like code_root. We can use code root to manually detect every translation unit under such path, and from there start to building our dependencies tree
  5. Finally, as other build systems does, we may just require the user to attach to every target the desired files that wants to link

Internal

  1. Remove the dependencies and other attributes that should not be configurable from the user's side when the feature is ready
  2. Take precaution matching the filename with the module declared name