skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.09k stars 169 forks source link

PHP's date/time parser with re2c 0.16 & Apple LLVM version 7.3.0 (clang-703.0.29) causes massive slow down #154

Closed derickr closed 2 years ago

derickr commented 8 years ago

The PHP 7.1 Release Manager ( @dshafik ) emailed (http://news.php.net/php.internals/94582) about a recent commit to PHP's date/time parser (https://github.com/php/php-src/commit/7759d6b0db0437195442ea2d43e1a49001ef23a7). On OSX with the above mentioned LLVM version, he said that the date tests were timing out.

I had forgotten to use -b, and fixed that through http://news.php.net/php.cvs/93173. But that did not solve the problem. The only thing that worked was going back to re2c 0.15.3 (or 0.13.5). This only seems to have been a problem for his Mac OSX builds, as several of us could not reproduce this on Linux with GCC.

I don't precisely know what went wrong, or how, as I don't have access to Mac OSX, but I thought I'd file this issue to track possible future cases of this, and hopefully find out why this happened.

skvadrik commented 8 years ago

Hi Derick!

In short: the issue is not with Mac OSX and not with re2c, it's with clang: re2c-0.16 generates control-flow graph that happens to confuse that exact version of clang. Not long ago I reported a similar bug in gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67783).

This behaviour is not specific to control-flow graphs generated by re2c-0.16. I can reproduce insanely long compile times on lexers generated with re2c-0.15.3, but not re2c-0.16 (about 10 minutes on my Intel Core i3) on Linux with clang-3.6 (optimization level -O2). Going back to old re2c versions in the hope of stabilizing re2c-generated code won't solve the issue in the long term, as new compiler versions could introduce new bugs (in the algorithmic complexity sense of the word "bug"), both clang and gcc.

The exact source code I used in my experiments is PHP date/time parser as of 2015-02-11 (from re2c test suite). re2c-0.16 actually generates better code for it than re2c-0.15.3 (smaller due to DFA minimization, takes less time to compile on my box with both clang and gcc).

The best way out (but not a quick hack to fix PHP build times) would be to add re2c-generated code to gcc and clang test suites. I think compiler writers should be interested in real-world examples of complex control-flow graphs.

It would be interesting to take a closer look at why clang is confused by re2c-generated CFG and probably report a bug.

derickr commented 8 years ago

Thanks for that extensive analysis.

However, my issue is about run times of the generated .c code, and not build/compile times.

skvadrik commented 8 years ago

Hmm. That is very strange. And interesting.

I cannot reproduce it with clang-3.6 or clang-3.8 (not sure how these versions map to OSX versions).

In order to measure run time of the generated code, I use '--skeleton' option (to get self-contained program from PHP test). So my benchmark is obviously imprecise and different from the original. In my case re2-0.16 and re2c-0.15.3 show a bit different results (re2c-0.16 is generates smaller code that runs faster, as expected).

If you could provide enough details to reconstruct the original benchmark, I could try to isolate clang version that generates slow code.

derickr commented 8 years ago

I don't remember which exact test case it was. Will need to defer to @dshafik for that.

ryandesign commented 3 years ago

PHP ended up working around the problem in July 2016 by regenerating with re2c 0.15.3.

But now this problem has bitten PHP again in August 2020, because they again regenerated with a newer re2c, this time using 2.0.3: https://bugs.php.net/80376

The clang bug did evidently get fixed at some point. Apple LLVM version 8.0.0 (clang-800.0.42.1) is still affected but Apple LLVM version 9.0.0 (clang-900.0.39.2) is not. But users of old systems are not able to update to newer versions of Xcode that contain the fixed compiler.

ryandesign commented 3 years ago

After installing and testing several versions of Apple Clang, I've found that the broken versions are Apple LLVM version 7.3.0 (clang-703.0.29) (which came with Xcode 7.3) through Apple LLVM version 8.0.0 (clang-800.0.42.1) (which came with Xcode 8.1/8.2/8.2.1) inclusive.

Apple LLVM version 7.0.2 (clang-700.1.81) (which came with Xcode 7.2.1) and earlier work fine, and Apple LLVM version 8.1.0 (clang-802.0.38) (which came with Xcode 8.3) and later work fine.

skvadrik commented 3 years ago

Does it help to drop re2c -g option for the problematic LLVM version?

ryandesign commented 3 years ago

The -g option isn't being used. See https://github.com/derickr/timelib/blob/master/Makefile#L34-L38

skvadrik commented 3 years ago

Ok, then -b option. Dropping it does not affect the underlying DFA or minimization, but it makes the generated code simpler.

skvadrik commented 3 years ago

@ryandesign If that would help, I can add an option that turns off minimization (then you could set it conditionally if a problematic LLVM version is used).

ryandesign commented 3 years ago

I haven't tested dropping the -b option yet.

php ships with generated parsers, so adding options to re2c that change how the generation works wouldn't help php, would it? It's not known at the time that the parsers are generated what C compiler will be used.

The datetime parsers in php 7.4.14 and 8.0.1 were generated with re2c 0.15.3 again so they work even with the affected versions of clang. php 7.3 no longer receives non-security updates so its final versions will forever contain the datetime parsers generated by re2c 2.0.3 and won't be compilable with the affected clang versions.

skvadrik commented 3 years ago

php ships with generated parsers, so adding options to re2c that change how the generation works wouldn't help php, would it?

Right, in that case it wouldn't help. But I thought that PHP might have some logic to regenerate parsers if re2c is available, or else to use the autogenerated files.

trofi commented 2 years ago

Let's close as it's a compiler issue in outdated compiler version.