skvadrik / re2c

Lexer generator for C, C++, Go and Rust.
https://re2c.org
Other
1.07k stars 169 forks source link

Some skeleton tests are extremely slow on Windows #331

Open sergeyklay opened 3 years ago

sergeyklay commented 3 years ago

Hello,

I'm currently developing a native PowerShell wrapper for running re2c tests on Windows without resorting to Mingw, Cygwin and so on. I managed to write multi-threaded tests runner using PowerShell only (right now for skeleton tests only).

I run tests on my 20 cores Xeon in parallel and everything is going well except for some tests. They are super slow. After a dig into, I realized that these tests are extremely slow due to the generation process. Below I provide measurements of some generations:

6.8190486 seconds

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "debug\closure_stats_gtop.re" `
    -o "debug\closure_stats_gtop.c" `
    -i `
    --posix-captures `
    --posix-closure gor1 `
    --dump-closure-stats `
    --fixed-tags toplevel `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

9.3340122 seconds

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "encodings\class3.re" `
    -o "encodings\class3.c" `
    -i8 `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

7.642483 seconds

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "debug\closure_stats_gor1.re" `
    -o "debug\closure_stats_gor1.c" `
    -i `
    --posix-captures `
    --posix-closure gtop `
    --dump-closure-stats 
    --fixed-tags toplevel `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

13.0525341 seconds:

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "bug1708378.re" `
    -o "bug1708378.c" `
    -ib `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

265.9861263 seconds: 🎉

& C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe  `
    "bug128.re" `
    -o "bug128.c"  `
    -W `
    --no-version `
    --no-generation-date `
    --skeleton `
    -Werror-undefined-control-flow

Actually this is not a complete list. Some tests never finished. According to my quick look, slow tests are about 9%. The rest of the tests are as fast as, for example, in macOs or Linux. re2c using built with the following configuration

Manual run shows the same degradation:

PS C:\src\re2c\test_201023160030> Measure-Command { & C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe `
>> bug1708378.re `
>> -o bug1708378.c `
>> -ib `
>> -W `
>> --no-version `
>> --no-generation-date `
>> --skeleton `
>> -Werror-undefined-control-flow }

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 923
Ticks             : 129239968
TotalDays         : 0.000149583296296296
TotalHours        : 0.00358999911111111
TotalMinutes      : 0.215399946666667
TotalSeconds      : 12.9239968
TotalMilliseconds : 12923.9968

I will publish additional comments during my research.

sergeyklay commented 3 years ago

I store the execution time of the each generation in separated *.log file in the following form:

<re2c executable> <re2c options> (<time>)

So here are some results (some tests never finished):

$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column

1.0123708       1.2061186       1.398478        1.7313844       3.545979        6.2762644       19.1019572
1.0175466       1.2140428       1.3999757       1.7506811       3.6755543       6.320666        19.8422225
1.0207795       1.2177153       1.4003677       1.8222493       3.7104511       6.3240929       19.8937025
1.0279499       1.2271273       1.4299672       1.8373235       3.9307811       6.461851        20.5925374
1.0347854       1.2375833       1.4381695       1.871309        4.1462893       6.5952994       20.6457992
1.0506582       1.2478265       1.4416772       2.1121065       4.2338955       6.605284        20.8574713
1.0710235       1.2577638       1.4619963       2.1282506       4.9962177       6.8190486       21.0589518
1.092179        1.2784787       1.4866995       2.1299754       5.11254         7.4957975       21.3088674
1.0995379       1.2806526       1.5073277       2.2660121       5.2844923       7.6238291       21.6054515
1.1423733       1.2822474       1.5266376       2.3043965       5.3296963       7.642483        28.5312305
1.156332        1.2883214       1.5412814       2.3535283       5.4675259       7.8018155       132.8422803
1.162659        1.2978251       1.5423705       2.4174622       5.4926532       9.3340122       265.9861263
1.1639108       1.3222313       1.5683961       2.4445399       5.5172051       9.52987         491.276471
1.1711366       1.3363735       1.568921        2.7039793       5.5980776       9.7681643       492.4069617
1.1730053       1.3391551       1.5843521       2.7298563       5.6159203       9.9542416       3828.0381905
1.1743305       1.3530512       1.6159905       2.7907898       5.6598511       10.7178018      3838.0232887
1.1799193       1.3593102       1.6267786       3.1749009       5.9794209       13.0525341
1.1804136       1.3800682       1.6795056       3.2259422       6.0503126       13.2396031
1.1934251       1.3883636       1.7041505       3.3845808       6.1115797       15.0390006
1.2053042       1.3914672       1.7158329       3.5315845       6.2747898       19.0428656
skvadrik commented 3 years ago

This is very interesting! A few questions off the top of my head:

skvadrik commented 3 years ago

One more in that direction: skeleton-specific output differs from normal output in that it is binary (fopen is called with "wb").

sergeyklay commented 3 years ago

Are these tests slow only with --skeleton option? What happens if you drop that option?

As I sad before I'm currently developing skeleton validation support. Regular tests are not ready, so I cannot measure programmatically. Calling the following command by hand:

$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
$IO = "bug1708378.re -o bug1708378.c"

$Command = "$Executable $IO $Options"

Remove-Item bug1708378.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }

generates output for 13 seconds:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 12
Milliseconds      : 905
Ticks             : 129053813
TotalDays         : 0.00014936783912037
TotalHours        : 0.00358482813888889
TotalMinutes      : 0.215089688333333
TotalSeconds      : 12.9053813
TotalMilliseconds : 12905.3813

File sizes are (in bytes):

PS C:\src\re2c\test_201023160030> ls bug1708378.*

    Directory: C:\src\re2c\test_201023160030

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/24/2020 12:08 PM         566983 bug1708378.c
-a---          10/24/2020 12:08 PM       28026493 bug1708378.c.line216.input
-a---          10/24/2020 12:08 PM        3475398 bug1708378.c.line216.keys
-a---          10/17/2020 12:23 PM          14989 bug1708378.re

Note, in this case, this is a manual command run, without any wrappers, just call re2c from the terminal.

Without --skeleton option generation time is notable better:

- $Options = "-ib -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"
+ $Options = "-ib -W --no-version --no-generation-date -Werror-undefined-control-flow"

Only 2 seconds (in 6 times faster):

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 2
Milliseconds      : 808
Ticks             : 28083572
TotalDays         : 3.25041342592593E-05
TotalHours        : 0.000780099222222222
TotalMinutes      : 0.0468059533333333
TotalSeconds      : 2.8083572
TotalMilliseconds : 2808.3572

File sizes are (in bytes):

PS C:\src\re2c\test_201023160030> ls bug1708378.*

    Directory: C:\src\re2c\test_201023160030

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/24/2020 12:09 PM         365074 bug1708378.c
-a---          10/17/2020 12:23 PM          14989 bug1708378.re

By the way, despite a 6x run time improvement, 2 seconds for bug1708378.re is still not the best time, isn't? Just to get ahead of your question, I want to say that all tests are performed using the following V-NAND SSD: Samsung SSD 970 EVO Plus NVMe M.2

skvadrik commented 3 years ago

Ok, so the files are generated properly and have reasonable size.

2 seconds for bug1708378.re is still not the best time, isn't?

Correct, this is way too much time for this tests. re2c bug1708378.re -o bug1708378.c takes under 45 milliseconds on my Linux machine. And with skeleton option re2c bug1708378.re -S -o bug1708378.c takes about 600 milliseconds, which is also fast.

tests are performed using the following V-NAND SSD

I don't think that it has to do with the hardware. Judging from the tests, I think the slowdown is caused by the way re2c is writing the output. In case of skeleton tests, re2c makes many fwrite calls (each appending a small bit to the output file) and relies on the system libc to make it buffered, so that many small fwrite operations are joined into large and less frequent write system calls (or the Windows analogue of write). In case of non-skeleton output it's probably not the system calls, but the mixture of stdio and iostream operations (a different libc/lbstdc++ implementation on Windows).

This is only a guess, but it is the first thing that I'd check.

re2c needs a --time-profile option. :)

sergeyklay commented 3 years ago

Yet another generation I managed to measure:

$Executable = "C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "c\submatch\http_rfc7230.re -o c\submatch\http_rfc7230.c"

$Command = "$Executable $IO $Options"

Remove-Item http_rfc7230_skeleton.c* -Force
Measure-Command { Invoke-Expression $Command | Out-Null }

Only 2 seconds:

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 2
Milliseconds      : 588
Ticks             : 25884700
TotalDays         : 2.99591435185185E-05
TotalHours        : 0.000719019444444444
TotalMinutes      : 0.0431411666666667
TotalSeconds      : 2.58847
TotalMilliseconds : 2588.47

The same but with skeleton:

- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"

It lasted forever:

Days              : 0
Hours             : 1
Minutes           : 5
Seconds           : 1
Milliseconds      : 833
Ticks             : 39018332165
TotalDays         : 0.0451601066724537
TotalHours        : 1.08384256013889
TotalMinutes      : 65.0305536083333
TotalSeconds      : 3901.8332165
TotalMilliseconds : 3901833.2165

File sizes are (in bytes):

PS C:\src\re2c> ls .\test_201024170134\c\submatch\http_rfc7230*

    Directory: C:\src\re2c\test_201024170134\c\submatch

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----

-a---          10/24/2020  6:31 PM         313786 http_rfc7230.c
-a---          10/24/2020  6:31 PM       27890920 http_rfc7230.c.line282.input
-a---          10/24/2020  6:31 PM        3769927 http_rfc7230.c.line282.keys
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re
skvadrik commented 3 years ago

Interesting, this one has approximately the same size of the largest skeleton file http_rfc7230.c.line282.input (27 MB), but it took over an hour compared to bug1708378.c.line216.input that took 13 seconds. My output-generation hypothesis doesn't fit here.

If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of http_rfc7230.re? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.

But it may be easier to use Windows profiling tools (I've heard of ETW).

sergeyklay commented 3 years ago

It is noteworthy that the generation in Windows, using Mingw, is much faster than using Visual Studio.

Without --skeleton option:

$Executable = "C:\src\re2c\cmake-build-debug-mingw\re2c.exe"
$Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
$IO = "http_rfc7230.re -o http_rfc7230_mingw.c"

$Command = "$Executable $IO $Options"

Measure-Command { Invoke-Expression $Command | Out-Null }

it takes 588 milliseconds (4.4x faster than using Visual Studio):

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 588
Ticks             : 5889669
TotalDays         : 6.81674652777778E-06
TotalHours        : 0.000163601916666667
TotalMinutes      : 0.009816115
TotalSeconds      : 0.5889669
TotalMilliseconds : 588.9669

File sizes are (in bytes):

PS C:\src\re2c\test_manual> ls *

    Directory: C:\src\re2c\test_manual

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/25/2020 12:07 PM         198597 http_rfc7230_mingw.c
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re

with --skeleton option

- $Options = "-i -W --no-version --no-generation-date -Werror-undefined-control-flow"
+ $Options = "-i -W --no-version --no-generation-date --skeleton -Werror-undefined-control-flow"

- $IO = "http_rfc7230.re -o http_rfc7230_mingw.c"
+ $IO = "http_rfc7230.re -o http_rfc7230_mingw_skeleton.c"

takes about 187 seconds (20.8x faster than using Visual Studio):

Days              : 0
Hours             : 0
Minutes           : 3
Seconds           : 7
Milliseconds      : 597
Ticks             : 1875970618
TotalDays         : 0.00217126228935185
TotalHours        : 0.0521102949444444
TotalMinutes      : 3.12661769666667
TotalSeconds      : 187.5970618
TotalMilliseconds : 187597.0618

File sizes are (in bytes):

PS C:\src\re2c\test_manual> ls *

    Directory: C:\src\re2c\test_manual

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          10/25/2020 12:19 PM         313776 http_rfc7230_mingw_skeleton.c
-a---          10/25/2020 12:19 PM       27890920 http_rfc7230_mingw_skeleton.c.line282.input
-a---          10/25/2020 12:19 PM        3769927 http_rfc7230_mingw_skeleton.c.line282.keys
-a---          10/17/2020 12:23 PM           9831 http_rfc7230.re

re2c was built with g++ v8.1.0 and the following CMake options:

-DCMAKE_BUILD_TYPE=Debug
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
sergeyklay commented 3 years ago

Interesting, this one has approximately the same size of the largest skeleton file http_rfc7230.c.line282.input (27 MB), but it took over an hour compared to bug1708378.c.line216.input that took 13 seconds. My output-generation hypothesis doesn't fit here.

If you have spare cycles, can you make an experiment and try https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re instead of http_rfc7230.re? It is the same regular expression stripped of all tags. It may give us insight whether tags have something to do with the slowdown.

But it may be easier to use Windows profiling tools (I've heard of ETW).

Mingw x64

without --skeleton with --skeleton
Generation Time 191.7389 ms 1323.9391 ms
Size http_rfc7230_notags.c 147943 B 230776 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

Visual Studio 2019 x64

without --skeleton with --skeleton
Generation Time 1226.1267 ms 8138.2939 ms
Size http_rfc7230_notags.c 147943 B 230774 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B
sergeyklay commented 3 years ago

@skvadrik Does this measurement show tags impact for slowdown?

skvadrik commented 3 years ago

@sergeyklay Without tags it is definitely faster (with skeleton: Mingw ~140x, VS ~470x; without skeleton: Mingw ~3x, VS: ~2x). This is aligned with the behaviour on Linux: without tags it takes 100ms, and with tags 21s (~210x). Perf tells me that 98% of time is spent in re2c::write_keys<unsigned char>, in allocating and deleting vectors. I'll have a look at it.

re2c was built with g++ v8.1.0 and the following CMake options:

-DCMAKE_BUILD_TYPE=Debug
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"

So this is a non-optimized build. Perhaps this explains the slowdown, as well as the difference between VS and Mingw (at zero levels of optimizations they both generate inefficient code, but with different level of inefficiency). How does an optimized build behave?

sergeyklay commented 3 years ago

How does an optimized build behave?

http_rfc7230_notags.re

https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re

Mingw x64
without --skeleton with --skeleton
Generation Time 146.4076 ms 431.4192 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 1064.8614 ms 8330.2958 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

http_rfc7230.re

https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re

Mingw x64
without --skeleton with --skeleton
Generation Time 210.038 ms 49896.8644 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 2451.7808 ms 3821611.5262 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
skvadrik commented 3 years ago

Ok, so there is not much difference with optimizations, and it doesn't bridge the gap between VS and Mingw.

I've had a closer look at why http_rfc7230.re is so slow. It is caused by a naive way of computing tag values for skeleton. Since skeleton has to work on non-optimized TDFA, it has many tag variables (N). It is normal before renaming, most of the variables are unused and they are reduced by the optimization phase, but the whole point of skeleton is to test the optimized code on data produced for non-optimized code. Skeleton keys are generated by simulating TDFA execution on a set of M strings, applying tag operations at each step. The intermediate results are stored in a large matrix of size NxM (over 1000000 elements in total), where elements of the matrix are vectors (some tags have a history, ie they need all their values not just the last one). Unsurprisingly the allocation and matrix population takes time. Much of it is unused, but that doesn't help.

I managed to get about 4x speedup (21s -> 5s) by reordering some loops and reducing matrix size to Nx1, but for the real fix we'll need a more efficient way of storing intermediate results (perhaps a trie for the tags that have history).

As to why it is particularly slow on VS, I suppose it can be caused by a different allocator.

sergeyklay commented 3 years ago

Hmm.. Very interesting! I hope we can get the best result for VS on this field. Feel free to ping me for any test. Thank you for the research.

skvadrik commented 3 years ago

GitHub auto-closed this because of commit message text, but I didn't intend to.

I made a few improvements to skeleton tests: https://github.com/skvadrik/re2c/commit/e8cdf1ded52df215045f2e437e11b27285707e73 and https://github.com/skvadrik/re2c/commit/80f364ee3a9dad93e897363425f5a28d4087eef6. On my Linux machine this results in ~2-3x speedup on run_tests.sh --skeleton and more than 20x speedup for http_rfc7230.re in particular.

sergeyklay commented 3 years ago

Wow! I'll check today and give you a feedback. Thank you!

sergeyklay commented 3 years ago

I made a few improvements to skeleton tests: e8cdf1d and 80f364e.

There are benchmark results:

http_rfc7230_notags.re

https://github.com/skvadrik/re2c/blob/master/benchmarks/http/rfc7230/http_rfc7230_notags.re

Mingw x64
without --skeleton with --skeleton
Generation Time 180.0535 ms 618.603 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 1042.8478 ms 8132.859 ms
Size http_rfc7230_notags.c 147943 B 230746 B
Size http_rfc7230_notags.c.line162.input 0 B 28730721 B
Size http_rfc7230_notags.c.line162.keys 0 B 3607848 B

http_rfc7230.re

https://github.com/skvadrik/re2c/blob/master/examples/c/submatch/http_rfc7230.re

Mingw x64
without --skeleton with --skeleton
Generation Time 226.9931 ms 3098.5436 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 2457.6249 ms 27062.125 ms
Size http_rfc7230.c 198597 B 313746 B
Size http_rfc7230.c.line162.input 0 B 27890920 B
Size http_rfc7230.c.line162.keys 0 B 3769927 B

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64

Result

Before your changes re2c compiled with Visual Studio was 458 times slower than re2c compiled with Mingw.

After your changes re2c compiled with Visual Studio became 8.7 times slower than re2c compiled with Mingw.

skvadrik commented 3 years ago

Good, then it makes things a bit faster! I pushed one more patch (https://github.com/skvadrik/re2c/commit/4a47dedad25cc1fd9c2cffeb7d1cc677464bed9e), but it makes no noticeable difference on my system.

How slow is the full test run now?

sergeyklay commented 3 years ago

Good, then it makes things a bit faster! I pushed one more patch (4a47ded), but it makes no noticeable difference on my system.

How slow is the full test run now?

I'll test it ASAP and give you a feedback. Good job! Thank you!

sergeyklay commented 3 years ago

I was able to do a minimal load testing. The timings were improved a lot and in general the tests run significant faster. However, some tests are still not fast enough to use for CI pipelines. Below I provide the slowest code generation with re2c compiled using 4a47ded commit.

For Mingw I used the following command line:

C:\src\re2c\cmake-build-debug-mingw\re2c.exe `
    bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow

The same command was used for a skeleton validation (with --skeleton).

For Visual Studio I used the following command line:

C:\src\re2c\cmake-build-debug-visual-studio-x64\Debug\re2c.exe `
    bug128.re -o bug128.c -W --no-version --no-generation-date -Werror-undefined-control-flow

The same command was used for a skeleton validation (with --skeleton).

The measurements below were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

There are benchmark results:

Mingw x64
without --skeleton with --skeleton
Generation Time 353.0869 ms 9863.7646 ms
Size of bug128.c 16270 B 29694 B
Size of bug128.c.line8.input - 324560902 B
Size of bug128.c.line8.keys - 62416974
Visual Studio 2019 x64
without --skeleton with --skeleton
Generation Time 5973.2428 ms 221058.2635 ms
Size of bug128.c 16270 B -
Size of bug128.c.line8.input - 324560902 B
Size of bug128.c.line8.keys - 62416974 B
Result

with --skeleton: re2c compiled with Visual Studio is about 22.4 times slower than re2c compiled with Mingw.

without --skeleton: re2c compiled with Visual Studio is about 16.9 times slower than re2c compiled with Mingw.

Please note, bug128.c file was not created for the following scenario:

I have no idea why it wasn't generated. And I have not seen any warning or error messages.

CMake flags

Mingw x64
-DCMAKE_BUILD_TYPE=Debug
-DCMAKE_C_FLAGS=-O2
-DCMAKE_CXX_FLAGS=-O2
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX=C:\src\re2c\install-mingw
-G "CodeBlocks - MinGW Makefiles"
Visual Studio 2019 x64
-DRE2C_BUILD_RE2GO=on
-DCMAKE_INSTALL_PREFIX="C:\src\re2c\install-msvc"
-DCMAKE_C_FLAGS=/O2
-DCMAKE_CXX_FLAGS=/O2
-G "Visual Studio 16 2019"
-A x64
sergeyklay commented 3 years ago

In accordance with https://github.com/skvadrik/re2c/issues/331#issuecomment-715376037 here are slowest tests (all tests were finished):

$ sort -n <(for f in $(ls *_*.log); do cat $f | grep -o -E '\([1-9]+.*\)' | tr -d '()' ; done) | column

1.000551        1.049238        1.0985912       1.1610301       1.3881291       2.4166578       7.7322819
1.0026765       1.0521585       1.1069511       1.1654575       1.4155576       3.4287334       8.705518
1.0068045       1.0574971       1.1199378       1.1672974       1.4557006       3.7058299       9.1262585
1.006806        1.0678113       1.1217741       1.1721492       1.4865933       5.1395657       11.3622493
1.0095393       1.0693213       1.1249263       1.1745822       1.5288221       5.1630022       11.7143255
1.0138449       1.0774119       1.1285558       1.1829852       1.5455525       5.2170471       12.1913201
1.0157868       1.0786276       1.131142        1.2105879       1.5553722       5.2170917       28.6310193
1.0172572       1.0815964       1.1355961       1.2214821       1.5648215       5.2423219       35.6424253
1.0175864       1.0824848       1.1384727       1.2247057       1.5660845       5.2487696       242.034956
1.0217653       1.0858299       1.1394903       1.2426674       1.5718653       5.2702669
1.0258584       1.0898011       1.1481312       1.2909222       1.596262        5.8009955
1.0300305       1.0910497       1.1509711       1.3091419       1.6746581       6.5150299
1.0337268       1.0913256       1.1549397       1.3220114       1.9889176       6.5898417
1.0411128       1.0932286       1.155237        1.3313342       2.1939683       6.7280103
1.0451031       1.0933827       1.1594367       1.3709589       2.3234237       7.6762901
skvadrik commented 3 years ago

bug128.re is difficult, it produces a large DFA that gets much smaller after minimization (it is the purpose of this test to ensure that re2c can minimize it and in reasonable time). I pushed https://github.com/skvadrik/re2c/commit/f24f3c78d51d688acd0d3a790bab8ea8a372bc8b that should speed it up a bit (perhaps by 1/3), but I suspect it's not enough. Skeleton construction have to go before minimization (so that it can find errors in it), and so it has to work on a large DFA.

For the remaining slow tests, can you post test names in addition to timings (maybe just a few slow tests)?

sergeyklay commented 3 years ago

Let me check this tomorrow. I'll provide the Top 10 slowest tests with timings. Thanks for your efforts to make re2c faster on Windows!

sergeyklay commented 3 years ago

Here are the test timings:

For Mingw I used the following command line:

C:\src\re2c\.build_mingw\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

For Visual Studio I used the following command line:

C:\src\re2c\.build_msvc\Debug\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

Below I provide the code generation results with re2c compiled using 70c45d13 commit. The no-skeleton timings have changed within the margin of error, so I'm not providing them here. The timing were taken using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

Note that the generation time was reduced twice even for Mingw. Good job!

Here are the slowest code-generations (except bug128.re since it was provided above). re2c was compiled using Visual Studio. First column is the time in seconds.

 16.4251298  c\submatch\http_rfc7230.re
 14.3443707  c\submatch\uri_rfc3986.re
 12.5504493  encodings\code_points_8.re
 11.9489916  tags\bench_http_rfc7230.re
 11.7245659  bug1708378.re
 11.3771996  tags\bench_uri_rfc3986.re
 9.7639883   encodings\code_points_u.re
 9.3953288   encodings\code_points_w.re
 9.3702854   encodings\utf8_names.re
 9.3198274   encodings\code_points_x.re
 7.8328027   posix_captures\skeleton_01.re
 7.3967537   encodings\class3.re
 7.3233439   debug\closure_stats_gor1.re
 6.809638    debug\closure_stats_gtop.re
 6.1107775   encodings\code_points.re
 6.0995565   encodings\code_points_e.re
 4.9683203   real_world\php20150211_parse_date.re
 4.8522657   encodings\unicode_blocks_u_encoding_policy_fail.re
 4.7366564   real_world\parse_date_1_78.re
 4.534634    encodings\unicode_blocks_u_encoding_policy_ignore.re
 4.0589073   bug57_original.re
 3.8969453   real_world\parse_date_db.re
 3.5421027   real_world\parse_date_b.re
 3.5134215   real_world\parse_date_g.re
 3.4023533   real_world\parse_date_s.re
 3.3955802   real_world\parse_date.re
 3.3522276   encodings\unicode_blocks_u_encoding_policy_substitute.re
 3.083979    posix_captures\gor3.re
 2.4607906   encodings\unicode_blocks_8_encoding_policy_substitute.re
 2.3087235   encodings\class1.re
 2.1166286   encodings\class2.re
 1.9528867   dot\php20150211_zend_language_scanner.re
 1.9294394   encodings\unicode_blocks_8_encoding_policy_ignore.re
 1.9075661   encodings\unicode_blocks_8_encoding_policy_fail.re
 1.7798628   encodings\unicode_blocks_x_encoding_policy_fail.re
 1.7585333   encodings\class4.re
 1.7427518   real_world\cpp.re
 1.6056584   encodings\unicode_blocks_x_encoding_policy_ignore.re
 1.5418737   encodings\unicode_blocks_x_encoding_policy_substitute.re
skvadrik commented 3 years ago

Thanks for the timings @sergeyklay!

I realized during my local experiments with bug128.re that 5s out of total 6s time taken by the test on my Linux system is the time taken by ext4 on truncating the output file: if the output file remains from the previous run, there is a 5s slowdown compared to the case when the output file does not exist. I'm talking about the *.input an *.keys files. @trofi debugged it further and provided a fix: https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d.

This shouldn't have any impact on your experiments, as ./run_test.sh --skeleton runs in a clean directory, and also the filesystem on Windows is likely different. But it confused my local measurements (when the file got occasionally removed I observed a very fast run, which I could not reproduce on subsequent runs).

sergeyklay commented 3 years ago

Anyway, I'm glad this issue helped so many things to be found that could be improved!

trofi commented 3 years ago

I noticed .input and .keys file are written using relatively small (~1k, ~3K) buffers when passing those to kernel.

I wonder if penalty per write on your system is somehow higher. Can you give the below patch a go? If it happens to help it will be interesting to see how speed varies for different buffer sizes.

--- a/src/skeleton/generate_data.cc
+++ b/src/skeleton/generate_data.cc
@@ -493,11 +493,13 @@ void emit_data(Skeleton &skel)
         error("cannot open file: %s", input_name.c_str());
         exit(1);
     }
+    setvbuf(input, NULL, _IOFBF, 128 * 1024);
     FILE *keys = fopen (keys_name.c_str(), "wb");
     if (!keys) {
         error("cannot open file: %s", keys_name.c_str());
         exit(1);
     }
+    setvbuf(keys, NULL, _IOFBF, 128 * 1024);

     cover_t cover(input, keys, skel.nodes_count);
     generate_paths(skel, cover);
sergeyklay commented 3 years ago

I wonder if penalty per write on your system is somehow higher. Can you give the below patch a go?

I applied the patch as shown above and here are the results of five runs of the test:

I used the following command line to run code-generation:

C:\src\re2c\.build_msvc\Debug\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

re2c was compiled in Visual Studio 2019 x64, the timings are in milliseconds. And to summarize, the difference is either not noticeable or is within the margin of error

skvadrik commented 3 years ago

Sigh!

On Linux @trofi 's patch also doesn't make any difference (the inverse of it, using _IONBF, makes the test at least 10x slower, which means the buffering is on by default and the buffer is large). I hoped that the slowness on Windows could be explained by unbuffered output, but apparently this is not so. I have two more patches to try, together they make about 1.5x speedup on Linux on bug128.re (the cumulative time of skeleton tests doesn't change noticeably). I will send them a bit later.

skvadrik commented 3 years ago

Two small improvements that I mentioned above: https://github.com/skvadrik/re2c/commit/efc92f64ce1329d5eb067f362386405e74a9a760 and https://github.com/skvadrik/re2c/commit/50ac133cc0ebb907a1eb8a3d3455ec67b8c2e1de.

sergeyklay commented 3 years ago

Thank you. Let me check them tomorrow. I'll provide the test results ASAP.

sergeyklay commented 3 years ago

Here are the test timings:

For Mingw I used the following command line:

C:\src\re2c\.build_mingw\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

For Visual Studio I used the following command line:

C:\src\re2c\.build_msvc\Debug\re2c.exe `
    bug128.re -o bug128.c `
    -W --no-version --skeleton --no-generation-date -Werror-undefined-control-flow

Below I provide the skeleton generation results with re2c compiled using 50ac133c commit. The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

Note, generation time was reduced 2.5x for Mingw. However, it seems that for Visual Studio the latest changes made re2c almost completely useless. I'm sorry.

skvadrik commented 3 years ago

Thanks @sergeyklay! These are both debug builds, right? Can you try release builds as well? Can you also check as of commit https://github.com/skvadrik/re2c/commit/efc92f64ce1329d5eb067f362386405e74a9a760 to see which of the two changes raised the time? I have no clue why they can make Visual Studio build so much slower. Perhaps it has something to do with STL and debug checks.

sergeyklay commented 3 years ago

Thanks @sergeyklay! These are both debug builds, right?

Yes, they are.

Can you try release builds as well? Can you also check as of commit efc92f6 to see which of the two changes raised the time?

I'll try release build as well as commit efc92f6 and share the metrics

sergeyklay commented 3 years ago

Below I provide the skeleton generation results with re2c compiled using efc92f6 commit. The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

Please note, bug128.c file was not created for the Debug configuration (only .keys and .input). This is not related to commit efc92f6 because I mention this on https://github.com/skvadrik/re2c/issues/331#issuecomment-719919263 for commit 4a47ded

trofi commented 3 years ago

Tried to install vc-2019 in a VM and I think -DCMAKE_CXX_FLAGS=/O2 has no effect on .vcxproj files: unoptimized (Debug) build is still used (according to cmake --build . --verbose output).

My runtime is ~104 seconds (https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d)

sergeyklay commented 3 years ago

@trofi Please note, for muti-config build types, e.g. Xcode or Visual Studio, you have to use --config option:

$ cmake --build . --config Release --verbose
skvadrik commented 3 years ago

Thanks @trofi, so all this time we've been looking at unoptimized debug Visual Studio build (/Od) vs optimized debug Mingw build (-O2 -g). I wonder if there is such thing as "optimized debug" for Visual Studio at all, maybe we should just use /O2 with -DRE2C_DEBUG=on (although the latter will be useless if assert does nothing). What we really need for CI is an optimized build with re2c-specific debug assertions enabled, but I'm afraid it's either no debug assertions at all, or full STL debug (the analogue of GLIBCXX_DEBUG on linux) and no optimizations. Skeleton tests are too heavyweight to run on unoptimized build.

I also wonder why commit https://github.com/skvadrik/re2c/commit/50ac133cc0ebb907a1eb8a3d3455ec67b8c2e1de raises the time for bug128.re from 100s to infinite (very large) for the unoptimized Visual Studio build. It is understandable that it can be very slow, but it would be good to know why it makes such a difference.

Finally, there is a mystery with bug128.c that does not get created (@trofi tells me it is created at https://github.com/skvadrik/re2c/commit/8090a1bd3013db24ef6bcdeb9729d9b9222bd03d in his experiment).

trofi commented 3 years ago

/O2 gets ignored by cmake because it's incompatible with default /RTC1 option (found the error in CMakeError.log): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl

skvadrik commented 3 years ago

@sergeyklay Maybe what we need for debug Visual Studio build is:

cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"

And for release build simply:

cmake --build . --config Release
sergeyklay commented 3 years ago

@sergeyklay Maybe what we need for debug Visual Studio build is:

cmake --build . --config Release -DCMAKE_CXX_FLAGS="-UNDEBUG -DRE2C_DEBUG"

And for release build simply:

cmake --build . --config Release

This sounds interesting. I'll need some time to investigate the solution. Thank you for the tip.

Another issues that bother me:

  1. In some cases, the *.c file is not created. The bug128.c case is not the only one

  2. As I said I'm currently developing a native PowerShell wrapper for running re2c tests on Windows. So, about 50% of tests or even more are failed (possible related to 1.). This is unexpected failure

sergeyklay commented 3 years ago

/O2 gets ignored by cmake because it's incompatible with default /RTC1 option (found the error in CMakeError.log): https://stackoverflow.com/questions/37007939/command-line-error-d8016-o2-and-rtc1-command-line-options-are-incompatibl

Right. Fiddling with CMAKE_CXX_FLAGS directly is not a modern CMake whatsoever. We should always use check_cxx_compiler_flag. And this why I'm not a big fan to set CMAKE_CXX_FLAGS in workflow configuration file. This rework was in my future plans.

skvadrik commented 3 years ago

In some cases, the *.c file is not created

Does this also happen in optimized Visual Studio build?

Unlike *.keys and *.input files, with the output file re2c tries to create it atomically: is writes the generated output to a temporary file (https://github.com/skvadrik/re2c/blob/master/src/util/temp_file.cc#L46-L67) and then renames it to the output file.

skvadrik commented 3 years ago

I investigated why Debug build takes infinite time. Apparently it enters an infinite loop in destruction of std::vector<range_t>. There is a function std::_Container_base12::_Orphan_all() that loops over a list and sets some entries to zero (I don't know what it's doing in a destructor of a vector, but the block of code containing the loop is guarded by #if _ITERATOR_DEBUG_LEVEL == 2, so it must be some debug checks). The loop should terminate when the next entry is NULL, but that never happens: the list contains a 3-cycle, and so the program loops over those 3 value infinitely (I left it overnight).

Here is a screenshot of debug session that shows the infinite loop: screenshot-20201108@112229

This can be either a bug in Visual Studio, or some STL edge case that is beyond my C++-fu (although the code that uses this std::vector<range_t> is quite simple and it's not doing any dirty memory tricks as far as I know). I don't think I can fix this aside from using different data structures for skeleton (e.g. replace vectors and wrap-iterators with circular lists).

What I still don't know is why some particular commit adds or removes this behavior. I see no connection, and it can be very complex, so I won't investigate this any further.

codylico commented 3 years ago

it looks like it's detaching std::vector< >::iterator instances from the source std::vector< >, but that's just a guess.

sergeyklay commented 3 years ago

@skvadrik Good job!

skvadrik commented 3 years ago

This commit https://github.com/skvadrik/re2c/commit/b8b107a3004ec01fc2dcd780fa8e979d7229131c should workaround the infinite loop in Visual Studio Debug build. I don't think that tweaking the code to workaround such bugs is a good strategy, but this particular change makes sense in general.

sergeyklay commented 3 years ago

Let me check this. I'll provide result ASAP.

sergeyklay commented 3 years ago

Below I provide the skeleton generation results with re2c compiled using commit 745f6e39 The timings were obtained using the following file: https://github.com/skvadrik/re2c/blob/master/test/bug128.re

All necessary files were generated (including *.c one). This is the best result so far. Congratulations!

about 50% of tests or even more are failed

As I can see now everything is OK! This is what PowerShell tests wrapper says:

-----------------
All:         1591
Ran:         1552
Passed:      1347
Soft errors: 205
Hard errors: 0
Total time:  240.175295
-----------------
PASSED
skvadrik commented 3 years ago

\o/ Great!

Total time 240.175295 is 4 minutes, and it's the time required to run skeleton tests in Release Visual Studio build, right? It seems fast enough for CI. What is the total time for Visual Studio Debug build and for Mingw build?

It might be worth reporting the Visual Studio problem with the infinite loop in Debug build.