Open sergeyklay opened 4 years ago
I store the execution time of the each generation in separated *.log
file in the following form:
<re2c executable> <re2c options> (<time>)
So here are some results (some tests never finished):
$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column
1.0123708 1.2061186 1.398478 1.7313844 3.545979 6.2762644 19.1019572
1.0175466 1.2140428 1.3999757 1.7506811 3.6755543 6.320666 19.8422225
1.0207795 1.2177153 1.4003677 1.8222493 3.7104511 6.3240929 19.8937025
1.0279499 1.2271273 1.4299672 1.8373235 3.9307811 6.461851 20.5925374
1.0347854 1.2375833 1.4381695 1.871309 4.1462893 6.5952994 20.6457992
1.0506582 1.2478265 1.4416772 2.1121065 4.2338955 6.605284 20.8574713
1.0710235 1.2577638 1.4619963 2.1282506 4.9962177 6.8190486 21.0589518
1.092179 1.2784787 1.4866995 2.1299754 5.11254 7.4957975 21.3088674
1.0995379 1.2806526 1.5073277 2.2660121 5.2844923 7.6238291 21.6054515
1.1423733 1.2822474 1.5266376 2.3043965 5.3296963 7.642483 28.5312305
1.156332 1.2883214 1.5412814 2.3535283 5.4675259 7.8018155 132.8422803
1.162659 1.2978251 1.5423705 2.4174622 5.4926532 9.3340122 265.9861263
1.1639108 1.3222313 1.5683961 2.4445399 5.5172051 9.52987 491.276471
1.1711366 1.3363735 1.568921 2.7039793 5.5980776 9.7681643 492.4069617
1.1730053 1.3391551 1.5843521 2.7298563 5.6159203 9.9542416 3828.0381905
1.1743305 1.3530512 1.6159905 2.7907898 5.6598511 10.7178018 3838.0232887
1.1799193 1.3593102 1.6267786 3.1749009 5.9794209 13.0525341
1.1804136 1.3800682 1.6795056 3.2259422 6.0503126 13.2396031
1.1934251 1.3883636 1.7041505 3.3845808 6.1115797 15.0390006
1.2053042 1.3914672 1.7158329 3.5315845 6.2747898 19.0428656
This is very interesting! A few questions off the top of my head:
--skeleton
option? What happens if you drop that option?--skeleton
is to blame, are tests generating output as they run? There should be *.keys
and *.input
files.One more in that direction: skeleton-specific output differs from normal output in that it is binary (fopen
is called with "wb"
).
Are these tests slow only with
--skeleton
option? What happens if you drop that option?
As I sad before I'm currently developing skeleton validation support. Regular tests are not ready, so I cannot measure programmatically. Calling the following command by hand:
$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
$IO = "bug1708378.re -o bug1708378.c"
$Command = "$Executable $IO $Options"
Remove-Item bug1708378.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }
generates output for 13 seconds:
Days : 0
Hours : 0
Minutes : 0
Seconds : 12
Milliseconds : 905
Ticks : 129053813
TotalDays : 0.00014936783912037
TotalHours : 0.00358482813888889
TotalMinutes : 0.215089688333333
TotalSeconds : 12.9053813
TotalMilliseconds : 12905.3813
File sizes are (in bytes):
PS C:\src\re2c\test_201023160030> ls bug1708378.*
Directory: C:\src\re2c\test_201023160030
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/24/2020 12:08 PM 566983 bug1708378.c
-a--- 10/24/2020 12:08 PM 28026493 bug1708378.c.line216.input
-a--- 10/24/2020 12:08 PM 3475398 bug1708378.c.line216.keys
-a--- 10/17/2020 12:23 PM 14989 bug1708378.re
Note, in this case, this is a manual command run, without any wrappers, just call re2c from the terminal.
Without --skeleton
option generation time is notable better:
- $Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
+ $Options = "-ib -W --no-version --no-generation-date -Werror-undefined-control-flow"
Only 2 seconds (in 6 times faster):
Days : 0
Hours : 0
Minutes : 0
Seconds : 2
Milliseconds : 808
Ticks : 28083572
TotalDays : 3.25041342592593E-05
TotalHours : 0.000780099222222222
TotalMinutes : 0.0468059533333333
TotalSeconds : 2.8083572
TotalMilliseconds : 2808.3572
File sizes are (in bytes):
PS C:\src\re2c\test_201023160030> ls bug1708378.*
Directory: C:\src\re2c\test_201023160030
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/24/2020 12:09 PM 365074 bug1708378.c
-a--- 10/17/2020 12:23 PM 14989 bug1708378.re
By the way, despite a 6x run time improvement, 2 seconds for bug1708378.re
is still not the best time, isn't? Just to get ahead of your question, I want to say that all tests are performed using the following V-NAND SSD: Samsung SSD 970 EVO Plus NVMe M.2
Ok, so the files are generated properly and have reasonable size.
2 seconds for bug1708378.re is still not the best time, isn't?
Correct, this is way too much time for this tests. re2c bug1708378.re -o bug1708378.c
takes under 45 milliseconds on my Linux machine. And with skeleton option re2c bug1708378.re -S -o bug1708378.c
takes about 600 milliseconds, which is also fast.
tests are performed using the following V-NAND SSD
I don't think that it has to do with the hardware. Judging from the tests, I think the slowdown is caused by the way re2c is writing the output. In case of skeleton tests, re2c makes many fwrite
calls (each appending a small bit to the output file) and relies on the system libc to make it buffered, so that many small fwrite
operations are joined into large and less frequent write
system calls (or the Windows analogue of write
). In case of non-skeleton output it's probably not the system calls, but the mixture of stdio
and iostream
operations (a different libc/lbstdc++ implementation on Windows).
This is only a guess, but it is the first thing that I'd check.
re2c needs a --time-profile
option. :)
Yet another generation I managed to measure:
$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "c\submatch\http_rfc7230.re -o c\submatch\http_rfc7230.c"
$Command = "$Executable $IO $Options"
Remove-Item http_rfc7230_skeleton.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }
Only 2 seconds:
Days : 0
Hours : 0
Minutes : 0
Seconds : 2
Milliseconds : 588
Ticks : 25884700
TotalDays : 2.99591435185185E-05
TotalHours : 0.000719019444444444
TotalMinutes : 0.0431411666666667
TotalSeconds : 2.58847
TotalMilliseconds : 2588.47
The same but with skeleton:
- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
It lasted forever:
Days : 0
Hours : 1
Minutes : 5
Seconds : 1
Milliseconds : 833
Ticks : 39018332165
TotalDays : 0.0451601066724537
TotalHours : 1.08384256013889
TotalMinutes : 65.0305536083333
TotalSeconds : 3901.8332165
TotalMilliseconds : 3901833.2165
File sizes are (in bytes):
PS C:\src\re2c> ls .\test_201024170134\c\submatch\http_rfc7230*
Directory: C:\src\re2c\test_201024170134\c\submatch
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/24/2020 6:31 PM 313786 http_rfc7230.c
-a--- 10/24/2020 6:31 PM 27890920 http_rfc7230.c.line282.input
-a--- 10/24/2020 6:31 PM 3769927 http_rfc7230.c.line282.keys
-a--- 10/17/2020 12:23 PM 9831 http_rfc7230.re
Interesting, this one has approximately the same size of the largest skeleton file http_rfc7230.c.line282.input
(27 MB), but it took over an hour compared to bug1708378.c.line216.input
that took 13 seconds. My output-generation hypothesis doesn't fit here.
If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of http_rfc7230.re
? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.
But it may be easier to use Windows profiling tools (I've heard of ETW).
It is noteworthy that the generation in Windows, using Mingw, is much faster than using Visual Studio.
Without --skeleton
option:
$Executable = "C:\src\re2c\cmake-build-debug-mingw\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
$Command = "$Executable $IO $Options"
Measure-Command { Invoke-Expression $Command | Out-Null }
it takes 588 milliseconds (4.4x faster than using Visual Studio):
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 588
Ticks : 5889669
TotalDays : 6.81674652777778E-06
TotalHours : 0.000163601916666667
TotalMinutes : 0.009816115
TotalSeconds : 0.5889669
TotalMilliseconds : 588.9669
File sizes are (in bytes):
PS C:\src\re2c\test_manual> ls *
Directory: C:\src\re2c\test_manual
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/25/2020 12:07 PM 198597 http_rfc7230_mingw.c
-a--- 10/17/2020 12:23 PM 9831 http_rfc7230.re
with --skeleton
option
- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
- $IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
+ $IO = "http_rfc7230.re -o http_rfc7230_mingw_skeleton.c"
takes about 187 seconds (20.8x faster than using Visual Studio):
Days : 0
Hours : 0
Minutes : 3
Seconds : 7
Milliseconds : 597
Ticks : 1875970618
TotalDays : 0.00217126228935185
TotalHours : 0.0521102949444444
TotalMinutes : 3.12661769666667
TotalSeconds : 187.5970618
TotalMilliseconds : 187597.0618
File sizes are (in bytes):
PS C:\src\re2c\test_manual> ls *
Directory: C:\src\re2c\test_manual
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/25/2020 12:19 PM 313776 http_rfc7230_mingw_skeleton.c
-a--- 10/25/2020 12:19 PM 27890920 http_rfc7230_mingw_skeleton.c.line282.input
-a--- 10/25/2020 12:19 PM 3769927 http_rfc7230_mingw_skeleton.c.line282.keys
-a--- 10/17/2020 12:23 PM 9831 http_rfc7230.re
re2c was built with g++ v8.1.0 and the following CMake options:
-DCMAKE_BUILD_TYPE=Debug
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Interesting, this one has approximately the same size of the largest skeleton file
http_rfc7230.c.line282.input
(27 MB), but it took over an hour compared tobug1708378.c.line216.input
that took 13 seconds. My output-generation hypothesis doesn't fit here.If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of
http_rfc7230.re
? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.But it may be easier to use Windows profiling tools (I've heard of ETW).
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 191.7389 ms | 1323.9391 ms |
Size http_rfc7230_notags.c |
147943 B | 230776 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 1226.1267 ms | 8138.2939 ms |
Size http_rfc7230_notags.c |
147943 B | 230774 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
@skvadrik Does this measurement show tags impact for slowdown?
@sergeyklay Without tags it is definitely faster (with skeleton: Mingw ~140x, VS ~470x; without skeleton: Mingw ~3x, VS: ~2x). This is aligned with the behaviour on Linux: without tags it takes 100ms, and with tags 21s (~210x). Perf tells me that 98% of time is spent in re2c::write_keys<unsigned char>
, in allocating and deleting vectors. I'll have a look at it.
re2c was built with g++ v8.1.0 and the following CMake options:
-DCMAKE_BUILD_TYPE=Debug -DRE2C_BUILD_RE2GO=on -DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw -G "CodeBlocks - MinGW Makefiles"
So this is a non-optimized build. Perhaps this explains the slowdown, as well as the difference between VS and Mingw (at zero levels of optimizations they both generate inefficient code, but with different level of inefficiency). How does an optimized build behave?
How does an optimized build behave?
http_rfc7230_notags.re
https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 146.4076 ms | 431.4192 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 1064.8614 ms | 8330.2958 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
http_rfc7230.re
https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 210.038 ms | 49896.8644 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 2451.7808 ms | 3821611.5262 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
Ok, so there is not much difference with optimizations, and it doesn't bridge the gap between VS and Mingw.
I've had a closer look at why http_rfc7230.re
is so slow. It is caused by a naive way of computing tag values for skeleton. Since skeleton has to work on non-optimized TDFA, it has many tag variables (N
). It is normal before renaming, most of the variables are unused and they are reduced by the optimization phase, but the whole point of skeleton is to test the optimized code on data produced for non-optimized code. Skeleton keys are generated by simulating TDFA execution on a set of M
strings, applying tag operations at each step. The intermediate results are stored in a large matrix of size NxM
(over 1000000 elements in total), where elements of the matrix are vectors (some tags have a history, ie they need all their values not just the last one). Unsurprisingly the allocation and matrix population takes time. Much of it is unused, but that doesn't help.
I managed to get about 4x speedup (21s -> 5s) by reordering some loops and reducing matrix size to Nx1
, but for the real fix we'll need a more efficient way of storing intermediate results (perhaps a trie for the tags that have history).
As to why it is particularly slow on VS, I suppose it can be caused by a different allocator.
Hmm.. Very interesting! I hope we can get the best result for VS on this field. Feel free to ping me for any test. Thank you for the research.
GitHub auto-closed this because of commit message text, but I didn't intend to.
I made a few improvements to skeleton tests: https://github.com/skvadrik/re2c/commit/e8cdf1ded52df215045f2e437e11b27285707e73 and https://github.com/skvadrik/re2c/commit/80f364ee3a9dad93e897363425f5a28d4087eef6. On my Linux machine this results in ~2-3x speedup on run_tests.sh --skeleton
and more than 20x speedup for http_rfc7230.re
in particular.
Wow! I'll check today and give you a feedback. Thank you!
I made a few improvements to skeleton tests: e8cdf1d and 80f364e.
There are benchmark results:
http_rfc7230_notags.re
https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 180.0535 ms | 618.603 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 1042.8478 ms | 8132.859 ms |
Size http_rfc7230_notags.c |
147943 B | 230746 B |
Size http_rfc7230_notags.c.line162.input |
0 B | 28730721 B |
Size http_rfc7230_notags.c.line162.keys |
0 B | 3607848 B |
http_rfc7230.re
https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 226.9931 ms | 3098.5436 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 2457.6249 ms | 27062.125 ms |
Size http_rfc7230.c |
198597 B | 313746 B |
Size http_rfc7230.c.line162.input |
0 B | 27890920 B |
Size http_rfc7230.c.line162.keys |
0 B | 3769927 B |
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
Before your changes re2c compiled with Visual Studio was 458 times slower than re2c compiled with Mingw.
After your changes re2c compiled with Visual Studio became 8.7 times slower than re2c compiled with Mingw.
Good, then it makes things a bit faster! I pushed one more patch (https://github.com/skvadrik/re2c/commit/4a47dedad25cc1fd9c2cffeb7d1cc677464bed9e), but it makes no noticeable difference on my system.
How slow is the full test run now?
Good, then it makes things a bit faster! I pushed one more patch (4a47ded), but it makes no noticeable difference on my system.
How slow is the full test run now?
I'll test it ASAP and give you a feedback. Good job! Thank you!
I was able to do a minimal load testing. The timings were improved a lot and in general the tests run significant faster. However, some tests are still not fast enough to use for CI pipelines. Below I provide the slowest code generation with re2c compiled using 4a47ded commit.
For Mingw I used the following command line:
C:\src\re2c\cmake-build-debug-mingw\re2c.exe `
bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow
The same command was used for a skeleton validation (with --skeleton
).
For Visual Studio I used the following command line:
C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe `
bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow
The same command was used for a skeleton validation (with --skeleton
).
The measurements below were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
There are benchmark results:
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 353.0869 ms | 9863.7646 ms |
Size of bug128.c |
16270 B | 29694 B |
Size of bug128.c.line8.input |
- | 324560902 B |
Size of bug128.c.line8.keys |
- | 62416974 |
without --skeleton |
with --skeleton |
|
---|---|---|
Generation Time | 5973.2428 ms | 221058.2635 ms |
Size of bug128.c |
16270 B | - |
Size of bug128.c.line8.input |
- | 324560902 B |
Size of bug128.c.line8.keys |
- | 62416974 B |
with --skeleton
:
re2c compiled with Visual Studio is about 22.4 times slower than re2c compiled with Mingw.
without --skeleton
:
re2c compiled with Visual Studio is about 16.9 times slower than re2c compiled with Mingw.
Please note, bug128.c
file was not created for the following scenario:
--skeleton
option was usedI have no idea why it wasn't generated. And I have not seen any warning or error messages.
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
In accordance with https://github.com/skvadrik/re2c/issues/331#issuecomment-715376037 here are slowest tests (all tests were finished):
$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column
1.000551 1.049238 1.0985912 1.1610301 1.3881291 2.4166578 7.7322819
1.0026765 1.0521585 1.1069511 1.1654575 1.4155576 3.4287334 8.705518
1.0068045 1.0574971 1.1199378 1.1672974 1.4557006 3.7058299 9.1262585
1.006806 1.0678113 1.1217741 1.1721492 1.4865933 5.1395657 11.3622493
1.0095393 1.0693213 1.1249263 1.1745822 1.5288221 5.1630022 11.7143255
1.0138449 1.0774119 1.1285558 1.1829852 1.5455525 5.2170471 12.1913201
1.0157868 1.0786276 1.131142 1.2105879 1.5553722 5.2170917 28.6310193
1.0172572 1.0815964 1.1355961 1.2214821 1.5648215 5.2423219 35.6424253
1.0175864 1.0824848 1.1384727 1.2247057 1.5660845 5.2487696 242.034956
1.0217653 1.0858299 1.1394903 1.2426674 1.5718653 5.2702669
1.0258584 1.0898011 1.1481312 1.2909222 1.596262 5.8009955
1.0300305 1.0910497 1.1509711 1.3091419 1.6746581 6.5150299
1.0337268 1.0913256 1.1549397 1.3220114 1.9889176 6.5898417
1.0411128 1.0932286 1.155237 1.3313342 2.1939683 6.7280103
1.0451031 1.0933827 1.1594367 1.3709589 2.3234237 7.6762901
bug128.re
is difficult, it produces a large DFA that gets much smaller after minimization (it is the purpose of this test to ensure that re2c can minimize it and in reasonable time). I pushed https://github.com/skvadrik/re2c/commit/f24f3c78d51d688acd0d3a790bab8ea8a372bc8b that should speed it up a bit (perhaps by 1/3), but I suspect it's not enough. Skeleton construction have to go before minimization (so that it can find errors in it), and so it has to work on a large DFA.
For the remaining slow tests, can you post test names in addition to timings (maybe just a few slow tests)?
Let me check this tomorrow. I'll provide the Top 10 slowest tests with timings. Thanks for your efforts to make re2c faster on Windows!
Here are the test timings:
For Mingw I used the following command line:
C:\src\re2c\.build_mingw\re2c.exe `
bug128.re -o bug128.c `
-W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow
For Visual Studio I used the following command line:
C:\src\re2c\.build_msvc\Debug\re2c.exe `
bug128.re -o bug128.c `
-W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow
Below I provide the code generation results with re2c compiled using 70c45d13 commit. The no-skeleton timings have changed within the margin of error, so I'm not providing them here. The timing were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
Note that the generation time was reduced twice even for Mingw. Good job!
Here are the slowest code-generations (except bug128.re
since it was provided above). re2c was compiled using Visual Studio.
First column is the time in seconds.
16.4251298 c\submatch\http_rfc7230.re
14.3443707 c\submatch\uri_rfc3986.re
12.5504493 encodings\code_points_8.re
11.9489916 tags\bench_http_rfc7230.re
11.7245659 bug1708378.re
11.3771996 tags\bench_uri_rfc3986.re
9.7639883 encodings\code_points_u.re
9.3953288 encodings\code_points_w.re
9.3702854 encodings\utf8_names.re
9.3198274 encodings\code_points_x.re
7.8328027 posix_captures\skeleton_01.re
7.3967537 encodings\class3.re
7.3233439 debug\closure_stats_gor1.re
6.809638 debug\closure_stats_gtop.re
6.1107775 encodings\code_points.re
6.0995565 encodings\code_points_e.re
4.9683203 real_world\php20150211_parse_date.re
4.8522657 encodings\unicode_blocks_u_encoding_policy_fail.re
4.7366564 real_world\parse_date_1_78.re
4.534634 encodings\unicode_blocks_u_encoding_policy_ignore.re
4.0589073 bug57_original.re
3.8969453 real_world\parse_date_db.re
3.5421027 real_world\parse_date_b.re
3.5134215 real_world\parse_date_g.re
3.4023533 real_world\parse_date_s.re
3.3955802 real_world\parse_date.re
3.3522276 encodings\unicode_blocks_u_encoding_policy_substitute.re
3.083979 posix_captures\gor3.re
2.4607906 encodings\unicode_blocks_8_encoding_policy_substitute.re
2.3087235 encodings\class1.re
2.1166286 encodings\class2.re
1.9528867 dot\php20150211_zend_language_scanner.re
1.9294394 encodings\unicode_blocks_8_encoding_policy_ignore.re
1.9075661 encodings\unicode_blocks_8_encoding_policy_fail.re
1.7798628 encodings\unicode_blocks_x_encoding_policy_fail.re
1.7585333 encodings\class4.re
1.7427518 real_world\cpp.re
1.6056584 encodings\unicode_blocks_x_encoding_policy_ignore.re
1.5418737 encodings\unicode_blocks_x_encoding_policy_substitute.re
Thanks for the timings @sergeyklay!
I realized during my local experiments with bug128.re
that 5s out of total 6s time taken by the test on my Linux system is the time taken by ext4 on truncating the output file: if the output file remains from the previous run, there is a 5s slowdown compared to the case when the output file does not exist. I'm talking about the *.input
an *.keys
files. @trofi debugged it further and provided a fix: https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d.
This shouldn't have any impact on your experiments, as ./run_test.sh --skeleton
runs in a clean directory, and also the filesystem on Windows is likely different. But it confused my local measurements (when the file got occasionally removed I observed a very fast run, which I could not reproduce on subsequent runs).
Anyway, I'm glad this issue helped so many things to be found that could be improved!
I noticed .input
and .keys
file are written using relatively small (~1k, ~3K) buffers when passing those to kernel.
I wonder if penalty per write on your system is somehow higher. Can you give the below patch a go? If it happens to help it will be interesting to see how speed varies for different buffer sizes.
--- a/src/skeleton/generate_data.cc
+++ b/src/skeleton/generate_data.cc
@@ -493,11 +493,13 @@ void emit_data(Skeleton &skel)
error("cannot open file: %s", input_name.c_str());
exit(1);
}
+ setvbuf(input, NULL, _IOFBF, 128 * 1024);
FILE *keys = fopen (keys_name.c_str(), "wb");
if (!keys) {
error("cannot open file: %s", keys_name.c_str());
exit(1);
}
+ setvbuf(keys, NULL, _IOFBF, 128 * 1024);
cover_t cover(input, keys, skel.nodes_count);
generate_paths(skel, cover);
I wonder if penalty per write on your system is somehow higher. Can you give the below patch a go?
I applied the patch as shown above and here are the results of five runs of the test:
I used the following command line to run code-generation:
C:\src\re2c\.build_msvc\Debug\re2c.exe `
bug128.re -o bug128.c `
-W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow
re2c was compiled in Visual Studio 2019 x64, the timings are in milliseconds. And to summarize, the difference is either not noticeable or is within the margin of error
Sigh!
On Linux @trofi 's patch also doesn't make any difference (the inverse of it, using _IONBF
, makes the test at least 10x slower, which means the buffering is on by default and the buffer is large). I hoped that the slowness on Windows could be explained by unbuffered output, but apparently this is not so. I have two more patches to try, together they make about 1.5x speedup on Linux on bug128.re
(the cumulative time of skeleton tests doesn't change noticeably). I will send them a bit later.
Two small improvements that I mentioned above: https://github.com/skvadrik/re2c/commit/efc92f64ce1329d5eb067f362386405e74a9a760 and https://github.com/skvadrik/re2c/commit/50ac133cc0ebb907a1eb8a3d3455ec67b8c2e1de.
Thank you. Let me check them tomorrow. I'll provide the test results ASAP.
Here are the test timings:
For Mingw I used the following command line:
C:\src\re2c\.build_mingw\re2c.exe `
bug128.re -o bug128.c `
-W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow
For Visual Studio I used the following command line:
C:\src\re2c\.build_msvc\Debug\re2c.exe `
bug128.re -o bug128.c `
-W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow
Below I provide the skeleton generation results with re2c compiled using 50ac133c commit. The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
Mingw x64: 1661.5629 ms It's about 2.5x speed up than https://github.com/skvadrik/re2c/issues/331#issuecomment-721036640. Yay!
Visual Studio 2019 x64: infinitely At least I waited several hours and the work was not completed.
Note, generation time was reduced 2.5x for Mingw. However, it seems that for Visual Studio the latest changes made re2c almost completely useless. I'm sorry.
Thanks @sergeyklay! These are both debug builds, right? Can you try release builds as well? Can you also check as of commit https://github.com/skvadrik/re2c/commit/efc92f64ce1329d5eb067f362386405e74a9a760 to see which of the two changes raised the time? I have no clue why they can make Visual Studio build so much slower. Perhaps it has something to do with STL and debug checks.
Thanks @sergeyklay! These are both debug builds, right?
Yes, they are.
Can you try release builds as well? Can you also check as of commit efc92f6 to see which of the two changes raised the time?
I'll try release build as well as commit efc92f6 and share the metrics
Below I provide the skeleton generation results with re2c compiled using efc92f6 commit. The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
Please note, bug128.c
file was not created for the Debug configuration (only .keys
and .input
). This is not related to commit efc92f6 because I mention this on https://github.com/skvadrik/re2c/issues/331#issuecomment-719919263 for commit 4a47ded
Tried to install vc-2019 in a VM and I think -DCMAKE_CXX_FLAGS=/O2
has no effect on .vcxproj
files: unoptimized (Debug) build is still used (according to cmake --build . --verbose
output).
My runtime is ~104 seconds (https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d)
@trofi Please note, for muti-config build types, e.g. Xcode or Visual Studio, you have to use --config
option:
$ cmake --build . --config Release --verbose
Thanks @trofi, so all this time we've been looking at unoptimized debug Visual Studio build (/Od
) vs optimized debug Mingw build (-O2 -g
). I wonder if there is such thing as "optimized debug" for Visual Studio at all, maybe we should just use /O2
with -DRE2C_DEBUG=on
(although the latter will be useless if assert
does nothing). What we really need for CI is an optimized build with re2c-specific debug assertions enabled, but I'm afraid it's either no debug assertions at all, or full STL debug (the analogue of GLIBCXX_DEBUG
on linux) and no optimizations. Skeleton tests are too heavyweight to run on unoptimized build.
I also wonder why commit https://github.com/skvadrik/re2c/commit/50ac133cc0ebb907a1eb8a3d3455ec67b8c2e1de raises the time for bug128.re
from 100s to infinite (very large) for the unoptimized Visual Studio build. It is understandable that it can be very slow, but it would be good to know why it makes such a difference.
Finally, there is a mystery with bug128.c
that does not get created (@trofi tells me it is created at https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d in his experiment).
/O2
gets ignored by cmake because it's incompatible with default /RTC1
option (found the error in CMakeError.log
): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl
@sergeyklay Maybe what we need for debug Visual Studio build is:
cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"
And for release build simply:
cmake --build . --config Release
@sergeyklay Maybe what we need for debug Visual Studio build is:
cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"
And for release build simply:
cmake --build . --config Release
This sounds interesting. I'll need some time to investigate the solution. Thank you for the tip.
Another issues that bother me:
In some cases, the *.c
file is not created. The bug128.c
case is not the only one
As I said I'm currently developing a native PowerShell wrapper for running re2c tests on Windows. So, about 50% of tests or even more are failed (possible related to 1.). This is unexpected failure
/O2
gets ignored by cmake because it's incompatible with default/RTC1
option (found the error inCMakeError.log
): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl
Right. Fiddling with CMAKE_CXX_FLAGS directly is not a modern CMake whatsoever. We should always use check_cxx_compiler_flag. And this why I'm not a big fan to set CMAKE_CXX_FLAGS in workflow configuration file. This rework was in my future plans.
In some cases, the *.c file is not created
Does this also happen in optimized Visual Studio build?
Unlike *.keys
and *.input
files, with the output file re2c tries to create it atomically: is writes the generated output to a temporary file (https://github.com/skvadrik/re2c/blob/master/src/util/temp_file.cc#L46-L67) and then renames it to the output file.
I investigated why Debug build takes infinite time. Apparently it enters an infinite loop in destruction of std::vector<range_t>
. There is a function std::_Container_base12::_Orphan_all()
that loops over a list and sets some entries to zero (I don't know what it's doing in a destructor of a vector, but the block of code containing the loop is guarded by #if _ITERATOR_DEBUG_LEVEL == 2
, so it must be some debug checks). The loop should terminate when the next entry is NULL
, but that never happens: the list contains a 3-cycle, and so the program loops over those 3 value infinitely (I left it overnight).
Here is a screenshot of debug session that shows the infinite loop:
This can be either a bug in Visual Studio, or some STL edge case that is beyond my C++-fu (although the code that uses this std::vector<range_t>
is quite simple and it's not doing any dirty memory tricks as far as I know). I don't think I can fix this aside from using different data structures for skeleton (e.g. replace vectors and wrap-iterators with circular lists).
What I still don't know is why some particular commit adds or removes this behavior. I see no connection, and it can be very complex, so I won't investigate this any further.
it looks like it's detaching std::vector< >::iterator
instances from the source std::vector< >
, but that's just a guess.
@skvadrik Good job!
This commit https://github.com/skvadrik/re2c/commit/b8b107a3004ec01fc2dcd780fa8e979d7229131c should workaround the infinite loop in Visual Studio Debug build. I don't think that tweaking the code to workaround such bugs is a good strategy, but this particular change makes sense in general.
Let me check this. I'll provide result ASAP.
Below I provide the skeleton generation results with re2c compiled using commit 745f6e39 The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re
All necessary files were generated (including *.c
one). This is the best result so far. Congratulations!
about 50% of tests or even more are failed
As I can see now everything is OK! This is what PowerShell tests wrapper says:
-----------------
All: 1591
Ran: 1552
Passed: 1347
Soft errors: 205
Hard errors: 0
Total time: 240.175295
-----------------
PASSED
\o/ Great!
Total time 240.175295
is 4 minutes, and it's the time required to run skeleton tests in Release Visual Studio build, right? It seems fast enough for CI. What is the total time for Visual Studio Debug build and for Mingw build?
It might be worth reporting the Visual Studio problem with the infinite loop in Debug build.
Hello,
I'm currently developing a native PowerShell wrapper for running re2c tests on Windows without resorting to Mingw, Cygwin and so on. I managed to write multi-threaded tests runner using PowerShell only (right now for skeleton tests only).
I run tests on my 20 cores Xeon in parallel and everything is going well except for some tests. They are super slow. After a dig into, I realized that these tests are extremely slow due to the generation process. Below I provide measurements of some generations:
6.8190486 seconds
9.3340122 seconds
7.642483 seconds
13.0525341 seconds:
265.9861263 seconds: 🎉
Actually this is not a complete list. Some tests never finished. According to my quick look, slow tests are about 9%. The rest of the tests are as fast as, for example, in macOs or Linux. re2c using built with the following configuration
Manual run shows the same degradation:
I will publish additional comments during my research.