Open foip opened 1 year ago
I really like what I am reading, but I need some time to figure out the impact that those changes will have in the user interaction with the API.
Let me a couple of days to think about the tradeoffs, and the future and direction of the project. I will come with an answer for this proposal
I didn't forget about this, neither I'll do. But I still would need more time to figure out the right direction.
Even tho, I like how the proposed changes looks like until the Taking it a step further
point. Forcing the user to match structures of packages sounds like a Java thing, and really C++ classical people (or even more "newers" like me) don't really like too much.
Also, scaning the dependencies won't be ineficcient, but most of the hard work is already done by the compilers with the implicit module lookup, take for example, the approach of MSVC
.
Also, GCC
automatically tracks every module dependency by itself, but there's the problem of the gcm.cache
folder path.... Potentially, Clang
users would be the most beneficiated, but I am not sure if it's worth the effort. Partitions are other thing... and module implementations also. Ideally, we could let the user have the same features until now (declare deps explicitly, or let Zork++ figure out as in your proposal)
Forcing the user to match structures of packages sounds like a Java thing
Most of the build systems for various languages I've worked with behave this way. Most of which are admittedly for JVM Languages (Kotlin, Scala, Clojure, Groovy). C# namespaces are also usually reflected in their source path. Even Rust modules follow this approach.
So for me the C++ ecosystem has always been the outlier that doesn't follow a similar approach (consistently).
I stand with my point, that a low configuration approach provides a lot of usability and - in my opinion - is something that would set apart Zork from other C++ build systems. But we might agree to disagree on that point.
GCC automatically tracks every module dependency by itself
I'm not sure how sufficient the GCC and MSVC approaches are, because we still need to build the modules in the right order so the dependent modules can pick them up. Correct me if I'm wrong, but at the moment we don't consider that in our build process.
Scanning the project structure from a single entry point would allow us to build a dependency tree as we go, which would also solve that.
Most of the build systems for various languages I've worked with behave this way. Most of which are admittedly for JVM Languages (Kotlin, Scala, Clojure, Groovy). C# namespaces are also usually reflected in their source path. Even Rust modules follow this approach.
Sure, you get a point on this. Even tho that the Rust approach is slightly different, more like the Python one (init.py and mod.rs), but I see this hard to translatate into the C++ ecosystem. Not for the technical implementation, but for the root of the foundations. Typically, anything that isn't part of the standard isn't well received in the community, and I hope that you already know how the C++ community is...
Correct me if I'm wrong, but at the moment we don't consider that in our build process
No, you're totally right. Build order depends on user declaration. You can take a look at other side project that we have in ZDC, ZERO that hope one day could be usable as an example of usage of Zork++. This project is more like a toy project, but maybe one day could get a good way (or not, I am not worried about), but still could be used as a bigger example (or more like a real-world project) for using and exemplify Zork++ As you may notice in my fork, in that branch, things are really getting out of control quickly, and here is where your idea become more important or relevant.
Scanning the project structure from a single entry point would allow us to build a dependency tree as we go, which would also solve that.
Sure. But we may take an intermediate path. We could pre-parse everything first (I mean, while the project_model
is built), and then figure out the dependency tree by ourselves by just looking at the export module, export import, import and module
statements. This shouldn't complicate much the things in terms of our effort, and it won't be the need to force the user to match a folder structure, (that I strongly believe that doesn't fit in the C++ dinosaur culture).
Typically, scanning the files for those declarations would be fast (since Rust is really fast, and we just will be rescaning dependencies only if the translation unit is modified), so I am really not worried about performance, and we could get the "better" of all the ideas.
project_model
is built, scan every translation unit
, up until the point where the first export (block of C++ code appears)
. Note that the standard dictates that all the export, export-import, import and module declarations
must appear within the module purview, so we just need to identify the case where the first export isn't the export
module primary interface declaration, then the other exports should be the public C++ API interface of such module.import
statements made in the module purviewtranslation unit
analyzer determines that we need to rebuild a translation unit, rebuild as well ALL the dependencies. That will allows us to finally address the issue with Clang
and their module-cache, which most of the times makes the build compilation fails due to cache "miss-match", making the user to clear our user cache and rebuild everything from scratch, which is extremely tedious and makes the user lose a lot of timeZork
job. This can be simply one of the cfg attributes that we already have, like code_root
. We can use code root to manually detect every translation unit under such path, and from there start to building our dependencies treetarget
the desired files that wants to linkdependencies
and other attributes that should not be configurable from the user's side when the feature is ready
Feature Request: Automatically resolve module dependencies
This approach would be a drastic change to the direction of this project. But I think it would be a nice approach, so I want to discuss it before trying to implement it.
What
At the moment the user has to declare all source modules and their dependencies inside
zork.toml
. It would be nice if Zork could take care of that.Why
All the alternative C++ build systems use some sort of scripting language for build configuration. Zork uses a more declarative way with TOML.
The problem with that is, that we can never match the configurability of a full blown scripting language. And I don't think we have to. Cargo for example also doesn' t allow full configuration for all source files, it allows just enough. This is a tradeoff between configurability and usability, and usability is what makes cargo so great. I think the usability route would be the right way for Zork.
Having to explicitly and correctly specify all source files and dependencies puts the burden onto the user. The user also has to change
zork.toml
potentially every time he makes code changes or refactorings.How could it be implemented
We would have to scan the source files for
import
statements after Loading the configuration file. Then we could fill thedependencies
properties in the project model.We could put the dependencies in a subdirectory of the cache, so we only have to scan the source file, if the code changed. It could look something like that:
Where a deps file could just be a plain text list of the dependencies
Tradeoffs
As mentioned above, the tradeoff is having to scan each changed source file once per build. But I would argue that having Zork scan the source files and determining the dependencies is still way faster than the user having to change the configuration manually (and less error prone for the user).
Taking it a step further
This might be taking it a bit too far, but here goes.
We could also take the full Cargo route, where the user only has to specify the entry point of the application (or entry points of the library). We simply scan the "entry" source file and only compile dependencies if needed.
But there is only one way I could think of to make this work. We would have to mandate that the source file location of modules correspond to the module name. For example the package
com.github.zork
would need to have the pathsrc/com/github/zork.cppm
orsrc/com/github/zork/mod.cppm
or something like that. Module partitions could live in that directory too:If some other module depends on this module we could simply compile the entire directory, where
mod.cppm
is the main module file, othercppm
files are module partitions and.cpp
files are implementation files. Of course you could make the extensions configurable.You could again argue that this would take a lot of configurability from the user, but if we look around in ecosystems of other languages as well, we find that classical C++ build systems are the only ones where you have no standardised source directory structure.
Most of the configuration inside
zork.toml
is done once and then never changed (like name or authors). So the source file configuration is the largest burden on the user. With these changes that burden would disappear almost completely.The cache strategy from above would also work with these more drastic changes, if we are concerned with performance.
Again, this is a very radical direction so I would like to hear your opinion.