seiferff / distcc

Automatically exported from code.google.com/p/distcc
GNU General Public License v2.0
0 stars 0 forks source link

pump mode performance problem for certain Boost headers #16

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
For certain Boost header files, which do "#include BOOST_PP_ITERATE()",
pump mode reports warnings, and eventually falls back to local
preprocessing.  There are two problems:

 (1) performance; when this case is hit, overall build speed can be as bad
as 5X worse than non-pump mode

 (2) the diagnostics that are given just state the translation unit, but
give no indication of which header file(s) or which #include statement(s)
are problematic.

Original issue reported on code.google.com by fer...@google.com on 4 Aug 2008 at 9:40

GoogleCodeExporter commented 8 years ago
It is my understanding that the problem is that pump mode needs to implement a 
complete C/C++ 
preprocessor here to do its job.  I have not looked at the details here, but I 
suppose that distcc features a 
preprocessor written in Python that is not fast enough to work on deeply-nested 
macros such as those of 
Boost.Pp.

So I guess what is needed is a portable, fast, modifiable preprocessor.

It turns out that in my team two students developed a C preprocessor, revcpp 
(it was designed to be 
reversible, with an uncpp).  I have no idea how much work it would be to adjust 
their base code to make it 
suitable for distcc, but I suspect it may be very moderate, in which case it 
would be a valuable addition to 
distcc.

Before contacting them, I would like to make sure that I understood the problem 
correctly, and that my 
proposal would be solution.

Original comment by akim.demaille@gmail.com on 13 Jan 2009 at 2:53

GoogleCodeExporter commented 8 years ago
> It is my understanding that the problem is that pump mode needs to implement 
a complete C/C++ 
> preprocessor here to do its job.

Not exactly.
Pump mode uses a "lifted" version of the C/C++ preprocessor to compute a set 
semantics:
from the set of possible values for macros, pump determines the set of possible 
header files that could be 
included.  That's a fairly non-trivial change from the standard preprocessor 
semantics.
For example, whereas standard cpp evaluates only one branch of each "#if", pump 
mode evaluates both 
branches.

The performance problems come when the sets involved get too large; the 
performance of the algorithm is 
exponential time in the worst case.

Original comment by fergus.h...@gmail.com on 13 Jan 2009 at 4:12

GoogleCodeExporter commented 8 years ago
[Fifth]

[Fourth attempt]

[Third attempt to push my message, as I have yet again

       Server Error

The server encountered an error and could not complete your request.
If the problem persists, please report your problem and mention
this error message and the query that caused it.]

[Typing this message another time, since, again, my submission
ended in a server failure, and going back gives an empty
message...]

The preprocessor I was referring to has similar constraints.  It
is meant for source to source transformations, so it has to
process all the possible combination of #if.  This creates a set
of file on which your perform your s2s transformations, and then
de-preprocess the set of files to recombine them into a single
un-preprocessed C/C++ file.

Of course there are issues with combinatorial explosion, and some
ideas were delvelopped to address it.

Maybe there are ideas, or even code, that can be reused?

Quentin and Benoît will know better than me.  Now cc'd.

Original comment by akim.demaille@gmail.com on 23 Jan 2009 at 2:54

GoogleCodeExporter commented 8 years ago
For some reason, I just can't include the following people in CC, and I get 
that server error :(

tsunanet at gmail
hocquet at gostai com

Original comment by akim.demaille@gmail.com on 23 Jan 2009 at 2:57

GoogleCodeExporter commented 8 years ago
Thanks for reporting that issue with the "server error".
I have managed to reproduce it and have filed a bug report with the Google team 
that
works on project hosting for code.google.com.

The "Cc" field only allows you to Cc people who are members of the project.
I don't know why that is done. It may be done as a spam minimization measure, 
but it
seems like a fairly heavy-handed approach.  Anyway, clearly the implementation 
of
this restriction is buggy, as you have discovered.  Hopefully we can get the bug
fixed fairly soon now that we know how to reproduce it.

Anyway, for the purpose of discussing this with tsunanet and hocquet, it's 
probably
better to send an email directly and just include the link to the distcc issue
<http://code.google.com/p/distcc/issues/detail?id=16>.

Original comment by fergus.h...@gmail.com on 23 Jan 2009 at 7:30

GoogleCodeExporter commented 8 years ago
Hi Fergus,
As Akim said, revcpp does compute a set semantics.  Sadly revcpp isn't yet 
usable for
production use, it doesn't implement everything (for instance __LINE__ and 
__FILE__
aren't properly expanded, we don't handle #define FILE "foo" and #include FILE, 
and
many more things are missing) and it's targeted at C++ only (CPP for C is a bit
different and some work is needed to make revcpp compliant with the C 
standard).  I
think most of the work is done but before using it some more work would be 
needed.

echo '#ifdef A
In A...
#ifdef A1
and in A1
#else
and no A1
#endif
that was A
#else
nothing defined!
#endif' | ./revcpp      
>>> ((defined(A1)) && (defined(A))) <<<
/*# File "(stdin)" "((defined(A1)) && (defined(A)))" #*/
In /*# MacroExpansion "A" "0" #*//*# MacroExpansionEnd "A" "0" #*/...
and in /*# MacroExpansion "A1" "2" #*//*# MacroExpansionEnd "A1" "2" #*/
that was /*# MacroExpansion "A" "1" #*//*# MacroExpansionEnd "A" "1" #*/
>>> ((!defined(A1)) && (defined(A))) <<<
/*# File "(stdin)" "((!defined(A1)) && (defined(A)))" #*/
In /*# MacroExpansion "A" "0" #*//*# MacroExpansionEnd "A" "0" #*/...
and no A1
that was /*# MacroExpansion "A" "1" #*//*# MacroExpansionEnd "A" "1" #*/
>>> !defined(A) <<<
/*# File "(stdin)" "!defined(A)" #*/
nothing defined!

If you wanna chat with me about it, feel free to buzz me or book a VC meeting 
with
me: tsuna@

-- 
Benoit Sigoure
SRE @ Dublin

Original comment by tsuna...@gmail.com on 1 Feb 2009 at 6:40

GoogleCodeExporter commented 8 years ago
Supporting computed includes (#define FILE "foo" and #include FILE) would 
definitely be needed; that's the 
problematic case which triggers the performance problem in distcc's pump mode. 
That's the only case where we 
need to actually evaluate macros.

Original comment by fergus.h...@gmail.com on 1 Feb 2009 at 8:28

GoogleCodeExporter commented 8 years ago
I wrote:
> Thanks for reporting that issue with the "server error".
> I have managed to reproduce it and have filed a bug report
> with the Google team that works on project hosting for code.google.com.

That bug is now fixed (although it may take some time before the fix is 
deployed).
Once the fix is deployed, you should no longer get a server error when adding
invalid email addresses as owner or cc; now there is a proper error message on
the page.

Thanks again for reporting that!

Original comment by fergus.h...@gmail.com on 12 Feb 2009 at 7:29

GoogleCodeExporter commented 8 years ago
Could the dependency tracking be delegated to 'g++ -M'?

That spits out the dependent header files. g++ can figure out the included 
headers
quite fast.

Original comment by bdhoff...@windstream.net on 24 Mar 2010 at 9:34

GoogleCodeExporter commented 8 years ago
No, dependency tracking can't be delegated to "g++ -M".
See comment 2 in this bug 
(<http://code.google.com/p/distcc/issues/detail?id=16#c2>).

Original comment by fergus.h...@gmail.com on 24 Mar 2010 at 9:52

GoogleCodeExporter commented 8 years ago
It appears that _CalculateIncludeClosureExceptSystem drops all header files in a
system directory, yet FindNode still explores all children of header files in a
system directory.

Assuming that system header files only include other system header files, could
FindNode stop looking for child nodes of system header files?  This would save 
having
to process boost headers.

Specifically is this patch valid?

Sorry if this is a stupid question, but I am still trying to lean the codebase.

Original comment by bdhoff...@windstream.net on 26 Mar 2010 at 4:04

Attachments:

GoogleCodeExporter commented 8 years ago
It's not a stupid question.

It will take some time for me to analyze your patch...

Original comment by fergus.h...@gmail.com on 26 Mar 2010 at 4:44

GoogleCodeExporter commented 8 years ago
After testing a bit, it turns out my assumption of system headers always 
including
files in system directories was not quite correct.

gtk headers (inside /usr/include) will include a a config header residing in
/usr/lib/<some library>/include

Since FindNode doesn't process the gtk headers, it never finds out about the 
config
header.  That config header is not in the compiler_defaults default paths thus
doesn't get included automatically in the temp directory tree.  Remote 
compilation
then fails.

If I change _SystemSearchdirsGCC to add '/usr' to the search dirs, then the tmp
directory gets a symlink to /usr thus letting me compile.

I am not sure how portable this change is, but it fixes the issue on my cluster.

Original comment by bdhoff...@windstream.net on 26 Mar 2010 at 7:31

Attachments:

GoogleCodeExporter commented 8 years ago
This patch fixed my problems too!

Original comment by ncha...@gmail.com on 24 May 2010 at 6:40

GoogleCodeExporter commented 8 years ago
I'm bumping this issue.  We ran into similar problems with Boost and 
bdhoffman's patch seems to be fixing our problem perfectly.  I would love for 
this patch to be formally reviewed and make its way upstream to the next 
release.

Original comment by edmundh...@gmail.com on 29 Aug 2012 at 8:40

GoogleCodeExporter commented 8 years ago
Hmm, the patch seems a bit hacky.  I am concerned that it may cause problems 
for other users.

Why does the compilation fail remotely without this patch?
Does the remote machine not include the gtk config header file?
Exactly how do the gtk header files include the gtk config header file?

At very least, if we're going to adopt a patch like this, the code should 
contain comments which explain why /usr is being added in to the search path.

Original comment by fergus.h...@gmail.com on 3 Sep 2012 at 1:40

GoogleCodeExporter commented 8 years ago
Two different companies I've been at have run into this same boost issue, and 
it seems to greatly limit the scalability since pump mode basically doesn't 
work. The patch appeared to work quite well for both code bases. Is there any 
other testing I can do so this patch gets merged in?

Original comment by plaztiks...@gmail.com on 27 Sep 2012 at 6:55

GoogleCodeExporter commented 8 years ago
For those who have not been follownig the mailing list, you might have missed 
my announcement: https://lists.samba.org/archive/distcc/2011q1/004166.html (and 
followups)

The code is still available here: 
https://code.launchpad.net/~arankine/distcc/issue16

I believe this is more correct than the above patch, as it fixes some 
fundamental parsing issues with the include_server.

I and my team have used it successfully with the boost headers to do 
distributed pump mode builds. Recently I haven't been building in this manner 
so I haven't had the impetus to package it up for inclusion in the main distcc 
distribution.

Original comment by arank...@gmail.com on 27 Sep 2012 at 7:09

GoogleCodeExporter commented 8 years ago
FWIW, Boost contains a full, *standards compliant* (very important for the PP 
metaprogramming code in question!) implementation of the C++ preprocessor: 
http://www.boost.org/doc/libs/1_55_0/libs/wave/

Original comment by dave@boostpro.com on 8 Feb 2014 at 4:47

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/326
Looks Good To Me.

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/327
LGTM

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/328
LGTM
(At least nothing obviously wrong, anyway.  This one is getty pretty hairy, 
though.  Need to look up what the standard says about macro arguments and 
whitespace...
I also don't really understand why this CL deleted some of the tests added in 
the previous CLs.)

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/329
I am not yet convinced that this one is safe.
I think it may be possible to construct an example for which this would 
underestimate the include closure,
causing distcc to not send all the necessary header files.

I wonder whether it would be better to enable this potentially unsafe 
optimization
only if a particular flag is passed?

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/330
http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/331
http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/333
http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/335
These patches are a bit hacky, but I guess as a pragmatic measure, they may be 
OK.
I wonder whether it would be better to enable the OVERRIDE_MACROS table
only if a particular flag (e.g. --boost) is passed.

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/332
LGTM, I think.

http://bazaar.launchpad.net/~arankine/distcc/issue16/revision/334
LGTM

Original comment by fergus.h...@gmail.com on 21 Feb 2014 at 2:14

GoogleCodeExporter commented 8 years ago
arankine, thanks for the patches.  Nice code and good tests.  I have added you 
to the list of people with svn commit permission on the distcc project.  Please 
feel free to commit the patches that I LGTM'd.  For the ones that I didn't LGTM 
yet, it might be just a matter of explaining in more detail why the patch is 
OK, or at worst modifying the patch to make the change conditional on a flag.

Regarding dave's comment #19, having a C preprocessor implementation is 
unfortunately not directly helpful;
see my comment #2 <https://code.google.com/p/distcc/issues/detail?id=16#c2>
for an explanation of why not.

Original comment by fergus.h...@gmail.com on 21 Feb 2014 at 2:29

GoogleCodeExporter commented 8 years ago
Wouldn't it be a good idea to make directories which are expected to be common 
on all systems configurable? We have boost on all machines under "/opt/..." and 
as I understood, adding the path to patch fix1 would repair the current problem.

Original comment by weidenri...@gmx.de on 16 Jul 2014 at 1:47