trentforkert / cmake

Experimental CMake with D support
Other
16 stars 2 forks source link

Dependency resolution doesn't work correctly #2

Open trentforkert opened 10 years ago

trentforkert commented 10 years ago

Currently, dependency resolution is implemented through add_d_target/examine_d_source.

cmDependsD also exists, but is left stubbed out.

Infodump:

The solution here is for me to port what I can of examine_d_source to cmDependsD, and go from there for VS.

trentforkert commented 10 years ago

As of bbaa6b1f82d6c55154a95d1995007e408c31103e dependency resolution/tracking is implemented for all Makefile-based generators.

Ninja, Visual Studio and XCode will require more work.

mathstuf commented 10 years ago

It looks like GDC's -M flags don't work :( . I'm getting empty depends lists in CMake, so something needs to be done for it.

trentforkert commented 10 years ago

Works for me using latest GDC binary release and Makefile generator.

If you're using (your fork of) Ninja, d1715a83995190989b921070a50e510d069fd12c should fix it.

trentforkert commented 10 years ago

AFAICT, Visual D handles dependency tracking on its own. It might be nice for CMake to also know about the dependency info, but if VS/VD is handling it, I think that's good enough.

trentforkert commented 10 years ago

I just confirmed that Ninja's .ninja_deps is being populated with D things.

Ninja: mathstuf/ninja@763c6c65d501e03b6efa9577406ed66b6ce4341a CMake: 6733f3dbf72a9cba73a91957b80f38ab83e9f552

The dep files (and all other features I tested) work across several versions of DMD, LDC, and GDC.

One problem is gdc-4.6 (D v2.055) that is shipped in Ubuntu Raring. That gdc is old enough to not have the -fmake-deps flag, causing the build to fail. The only other gdc I've tested so far is the latest binary release of 4.8 (D 2.064.2), so I'm not sure at what point between those two the flag was added.

I would just change the flag to -fdeps and be done with it, but it looks like @mathstuf's patches aren't going to be merged into upstream Ninja.

So really, instrumentation will have to be written to convert from DMD-style deps to Make-style deps. I've already done this for the Makefile generators in cmDependsD and cmLocalUnixMakefile3 (it isn't difficult).

My current ideas:

Currently I prefer 1, but 2 could be better depending on the definition of ???. 3 is a method of last resort.

I'll add this to my todo list, but I have other things I want to get to first.

mathstuf commented 10 years ago

Look at cmcldeps. That is probably what we need for Ninja. It'd be nice if make would then use that mechanism. Without looking, I think it takes some arguments which describes the dependency info (main dep, output file, intermediate depfile, and the final depfile) and everything else is the compiler command. It executes the compiler then transforms the intermediate depfile into the make-format depfile. Or at least that's another solution if that's all wrong :) .

trentforkert commented 10 years ago

This sort of approach might be for the best. One problem I encountered when doing Makefile generator support is that DMD (and the other D compilers) do full dependency analysis. That is, when running

dmd -deps=deps.txt -c -o- a.d

deps.txt will contain the dependency info for a, and all modules a imports, down to and including the standard library and runtime.

So, if you have a chain of modules a, b, c, and d, each one importing the next, generating the deps for each module individually ends up with duplicated work. The way I ended up working around this for cmDependsD was to rig things such that DMD would be called with -deps= once per target, then each object would scan that file for information relevant to it.

However, if I understand cmcldeps correctly, it generates the dependency file and object in the same compiler call, which is probably a little cleaner.

As a bonus, that would also allow me to move CMAKE_D_NO_OUTPUT_FLAG into UseD where it belongs, and possibly fold CMAKE_D_DEPS_FILE_FLAG into CMAKE_DEPFILE_FLAGS_D.

The only thing I don't like is that this would add an executable to CMake.

mathstuf commented 10 years ago

One thing to do could be to trace the public/private imports and not print private imports as dependencies. Scanning per-target is probably faster, but I don't think ninja is set up for that :-( . I still think ninja should have the feature because this kind of stuff is just going to get solved umpteen times since I doubt this particular solution (and associated code) would get shared with anything other than CMake.

mathstuf commented 10 years ago

So, if you have a chain of modules a, b, c, and d, each one importing the next, generating the deps for each module individually ends up with duplicated work.

I have a bug against LDC to have it ignore system dependencies in the depfile output. It should probably move to DMD, but it isn't going to help anyone today anyways.

The way I ended up working around this for cmDependsD was to rig things such that DMD would be called with -deps= once per target, then each object would scan that file for information relevant to it.

The way that the current scanners work, they don't use the compiler and instead do their own parsing which is why they'd be (much) faster and is why they are used. I don't think adding a compiler call which causes the compiler to do all the regular work as well is actually a performance improvement unless there's a "only do dependency scanning" mode.

The only thing I don't like is that this would add an executable to CMake.

One solution here is to use the git model and store the internal binaries somewhere other than /usr/bin.

trentforkert commented 10 years ago

Thinking aloud again:

If I were to implement a lexer for D we would lose:

We would gain:

However, D is a complex language, and even writing a(n) (accurate) lexer for it is an undertaking. This is the one of the main reasons I went for the compiler-provided-deps approach in the first place.

As I understand it, cmDependsC does a line-by-line regex for #includes, with the understanding that it is not entirely accurate, but good enough.

Hypothetically, a similar approach could be taken for D. My current thought is:

Find lines matching (^import[ \n\r\t]+)|([ \n\r\t]+import[ \n\r\t]+), and then have a minimal lexer take over until the semicolon.

This would find imports that were actually compiled out, and miss imports from templates and mixins. It would also leave the matter of converting module names to source file names and finding the files, but cmDependsC has to do that, too.

I could also allow an open paren in the regex, and do basic text import dependency tracking, so long as ctfe isn't used in getting the name of the import()ed file.

I could then package this up into, say, cmDdeps.cxx, and have it do two output modes depending on the args: one where it also forwards to the D compiler a la cmcldeps for use by Ninja, and one where it simply handles writing to depend.make for use by Make.

I suppose I could do the same, but still defer to the compiler and parse the depfile, but this occurred to me and I figured I should get the idea out there.

trentforkert commented 10 years ago

Further thinking aloud:

I've now written a simple test that uses PhilippeSigaud's Pegged (with a local fix for 64-bit) to produce a minimal D parser that only handles a single import declaration at a time (ignoring any comments it hits). On top of that parser, I wrote a D program that scans a source file for occurrences of the word "import", and tries to parse an import statement from there on, printing what it found on success.

One downside to my current implementation is that it will parse /* import std.algorithm; */ or similar without recognizing that the import statement is inside a comment. So, I might need to parse a fuller (or complete?) grammar for D.

Given a source file "everything.d", which imports every module in phobos and druntime, I get the following timings:

Time (s) Command Comment
0.461 gdc -c -fdeps=deps everything.d D 2.064.2
0.444 dmd -c -deps=deps everything.d D 2.065.0
0.021 ./parsetest everything.d Built with gdc -frelease -O3 ...
5.799 gdc -I . -c -fdeps=deps parsetest.d D 2.064.2
1.969 dmd -I. -c -deps=deps parsetest.d D 2.065.0
0.018 ./parsetest parsetest.d Built with gdc -frelease -O3 ...

Oof. Surpressing object file output doesn't change those times significantly either like I originally thought it would. I suppose it makes sense, because of the way deps is produced, but for some reason I had thought it was faster.

Granted, parsetest isn't finding the source files that correspond to the import names yet, which might increase the time involved a bit, but I don't expect that to contribute significantly to the times.

In searching to see if anyone had done the legwork on making a D lexer/parser in flex/bison (to match what cmDependsJava and cmDependsFortran do), I found a thread in which someone had started on one, to which @WalterBright suggested to use the lexer and parser from DMD's frontend sources directly. I don't mind this idea, but I'm pretty sure those files will be converted to D if/when DDMD is made official, making their use in CMake beyond that point more difficult:

For now, I'll try my hand at improving parsetest to handle parsing complete D sources, ignoring everything but the imports, and see how that goes. If nothing else it gives me an excuse to play around with Pegged some more.

mathstuf commented 10 years ago

On Fri, May 09, 2014 at 12:25:08 -0700, trentforkert wrote:

One downside to my current implementation is that it will parse /* import std.algorithm; */ or similar without recognizing that the import statement is inside a comment. So, I might need to parse a fuller (or complete?) grammar for D.

The way that the current dependency scanners search for is that if they can't find the file, they assume it will either be available (due to dependencies in which case it will show up in future builds) or they are

ifdef'd out. Basically, it assumes the build is at least sensible in

the first place.

In searching to see if anyone had done the legwork on making a D lexer/parser in flex/bison (to match what cmDependsJava and cmDependsFortran do), I found a thread in which someone had started on one, to which @WalterBright suggested to use the lexer and parser from DMD's frontend sources directly. I don't mind this idea, but I'm pretty sure those files will be converted to D if/when DDMD is made official, making their use in CMake beyond that point more difficult:

Well, ideally the D syntax wouldn't change significantly... You "just" need the version flags, include paths (implicit and explicit), comment and block parsing, and an import statement callback. Other than that, the parser should't care what is happening in the code.

I think the hardest parts are probably related to structures like:

version(Windows):

/* rest of file */

and

version(Windows)
    /* single statement */

The version with curly braces should be "easy". Also ctfe will likely cause problems. Hopefully no one is using computed imports rather than version blocks...

  • I doubt Kitware wants D sources in their codebase

It would make supporting the number of platforms currently supported tenuous at best. Does DMD support running on IRIX or AIX (CMake still has those as target platforms)?

  • Keeping the C++ versions of lexer and parser (and their dependencies) in sync with the D versions would likely be no small task, depending on how the language changes in the future. Granted, a custom rolled lexer/parser would have the same problem keeping in sync with such changes.

Well, CMake already has lex/yacc code, but I don't touch those (other than my recent help at removing the one which does the CMakeLists.txt parsing).

For now, I'll try my hand at improving parsetest to handle parsing complete D sources, ignoring everything but the imports, and see how that goes. If nothing else it gives me an excuse to play around with Pegged some more.

:)

--Ben

trentforkert commented 10 years ago

Fresh commit (ac1b5392372075231655f9769f84e828a793bfc9) implements DDeps, a dependency resolution parser for D, and the necessary code to have the Makefile and Ninja based generators use it.

Things it handles:

Caveats/assumptions:

Other thoughts:

Glad to finally be working on this again ^_^

mathstuf commented 10 years ago

It'd probably be worth trying to get all of this rebased on top of master. There have been a number of changes there that need to be accounted for in the branch (as a whole). I'll also have to dig out some patches I needed on my Windows machine to get it to build (and it worked with GDC and Ninja :D ).

trentforkert commented 10 years ago

I just created a new default branch d_support3, which is freshly rebased on master.

I've got it successfully building and generating projects on Linux, and a successful cross-compile targeting Windows (but I haven't tried actually using that build yet).

mathstuf commented 10 years ago

Not sure how easy it'd be for you, but could you get Derelict any my bulletml library built and try mu-cade? I'm getting this build rule:

command = "/home/boeckb/code/other/group-tools/group-bld/cmake/build/bin/ddeps" -o $DEP_FILE "GNU;linux;Posix;X86_64;LittleEndian;D_LP64;D_HardFloat;D_Version2" $out $in -- /home/boeckb/misc/root/gdc/bin/gdc  $DEFINES $FLAGS -I/home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/util -o $out -c $in

which has the wrong -I (looking at FLAGS for the build rules, they're right, so the one in rules.ninja shouldn't be necessary). In fact, the util directory is never added to the include list, so I don't know where it's coming from. I'm seeing this output when building:

[1-0->12/13@1.9] Building D object src/abagames/mcd/CMakeFiles/mu-cade.dir/boot.d.o
DDeps Warning: /home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/mcd/boot.d could not be found by the name "abagames/mcd/boot" in any of the following directories:
Ensure your import directories are correct,and that the module name matches the filename.
/home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/mcd/boot.d will be not be scanned for dependencies.
trentforkert commented 10 years ago

Not sure how easy it'd be for you, but could you get Derelict any my bulletml library built and try mu-cade? I'm getting this build rule:

I'll see about getting everything built.

command = "/home/boeckb/code/other/group-tools/group-bld/cmake/build/bin/ddeps" -o $DEP_FILE "GNU;linux;Posix;X86_64;LittleEndian;D_LP64;D_HardFloat;D_Version2" $out $in -- /home/boeckb/misc/root/gdc/bin/gdc  $DEFINES $FLAGS -I/home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/util -o $out -c $in

which has the wrong -I (looking at FLAGS for the build rules, they're right, so the one in rules.ninja shouldn't be necessary).

Are you running the latest from d_support3? To me, that looks like it is adding the current source dir, which shouldn't happen anymore.

In fact, the util directory is never added to the include list, so I don't know where it's coming from. I'm seeing this output when building:

[1-0->12/13@1.9] Building D object src/abagames/mcd/CMakeFiles/mu-cade.dir/boot.d.o
DDeps Warning: /home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/mcd/boot.d could not be found by the name "abagames/mcd/boot" in any of the following directories:
Ensure your import directories are correct,and that the module name matches the filename.
/home/boeckb/code/other/group-games/group-abagames/mu-cade/src/src/abagames/mcd/boot.d will be not be scanned for dependencies.

So, somehow, ddeps is unaware that you called include_subdirectories("${CMAKE_CURRENT_SOURCE_DIR}") in src/CMakeLists.txt...

trentforkert commented 10 years ago

Hmm. It builds fine for me. No ddeps warnings, ddeps' files look fine, as do Ninja's. No extra/wrong/missing includes anywhere I can see.

mathstuf commented 10 years ago

I may not have the most up-to-date code. Will try tomorrow.

snosov1 commented 9 years ago

Hi!

Do you think it makes sense to provide an option for the user to resort to -deps-based solution for resolving dependencies?

The reason I'm asking is, your own tool has some limitations that lead to false negatives.

For example, I don't know if you've heard of vibe.d, but here's thing. Vibe.d is some kind of a framework, that helps you to create web server applications. One of its features is to render html pages from text templates with embedded D code. It relies on text importing for that feature. E.g. you can find the code, like, mixin("static import "~MODULE~";"); in the render method. And since those kinds of dependencies are not recognized by your tool, when you edit a template file, the files that are sourcing it won't be re-compiled.

Your tool introduces a trade-off by speeding up things while sacrificing accuracy. For some applications it could be a reasonable choice. But it could be unacceptable (or close to it) for other ones.

mathstuf commented 9 years ago

FWIW, there's a similar problem in C with #include <MACRO> which CMake might get wrong.

snosov1 commented 9 years ago

Good to know.

But in C it's difficult to find sensible use-cases for this functionality. In D, on the contrary, it's not that hard (consider the example above) and also the "-deps" implementation is already in the compiler, so it's pretty trivial to do that.

trentforkert commented 9 years ago

I think the reduction in accuracy is perfectly acceptable for the speed increase it allows. Even with a false negative, the code will still build fine. The only downside is that if you modify the missed import, the containing module will not (necessarily) be rebuilt.

However, looking at the example from vibe, render will import the module that instantiates and calls render. Because templates are instantiated where they are called, this does not give vibe.web.web a dependency on MODULE, instead giving MODULE a dependency on MODULE. So, despite missing this import, ddeps still gets the right answer. Of course, ddeps can miss some imports that aren't redundant (inside string mixins or import(<str>) where <str> is not a string literal), but these can usually be refactored easily to allow ddeps to catch the imports.

The main problem with dmd -deps is that it will recursively scan every imported file when scanning a module, no matter what. It does this again for every object, since CMake does incremental compilation. This results in build times that are far too slow. DDeps caches results, giving a significant speed boost on top of the speed boost it gets by not instantiating templates, resolving mixins, or evaluating CTFE.

While compiling will still do this recursive scan to produce the actual object, also scanning dependencies with the compiler would double the compilation time. I don't think the slight increase in accuracy would ever be worth doubling the compilation time. I didn't want to implement a D lexer/parser, circumstance basically required it.

snosov1 commented 9 years ago

Fair enough. Thanks!