zeromq / azmq

C++ language binding library integrating ZeroMQ with Boost Asio
Boost Software License 1.0
318 stars 108 forks source link

azmq: replace boost.regex with std #181

Closed timblechmann closed 3 years ago

timblechmann commented 3 years ago

we can reduce the dependencies onto a boost.regex library by using std::regex and friends.

furthermore we can cache the compiled regex instead of building it every time

aboseley commented 3 years ago

This library is intrinsically coupled to boost, what are the benefits from switching from boost to std::regex. Std::regex is know for poor performance, thou this shouldn't be an issue in a bind call which is rarely a 'fast-path' operation.

timblechmann commented 3 years ago

what are the benefits from switching from boost to std::regex.

main benefit is binary size. shared libs of boost.regex are hundreds of kb, where std::regex is contained in the runtime library already.

Std::regex is know for poor performance

out of curiosity: do you have any reference for this?

aboseley commented 3 years ago

I've seen std::regex mentioned in few blog posts as an example of something that needs improvement but can't be change because it will break the ABI , which is a real shame. I don't think the boost library 'suffers' from the ABI compatibility constraint, so it can fix things like this.

One reference is this, but its light on detail wrt regex https://cor3ntin.github.io/posts/abi/

I've run a quick bench mark this morning for fun. https://github.com/aboseley/benchmark-regex

BM_MatchWithBoost       3786 ns         3785 ns       184163
BM_MatchWithStd        49727 ns        49704 ns        14020
timblechmann commented 3 years ago

out of curiosity: which compiler/stl implementation do you use?

appleclang-11/libcxx seems to be a little slower, but it's nowhere as extreme as what you're seeing

BM_MatchWithBoost       4793 ns         4791 ns       108551
BM_MatchWithStd         5671 ns         5670 ns       117849
aboseley commented 3 years ago

g++ (gcc) 11.1.0 on linux

timblechmann commented 3 years ago

at the risk of comparing apples with oranges, i've tweaked your benchmark program to cache the compiled regular expression (by making them static) ... after all, we don't need to re-compile the regex every time we want to match a string.

in this case std is actually faster than boost (gcc-10/linux)

BM_MatchWithBoost       99.3 ns         99.3 ns      6879716
BM_MatchWithStd         88.8 ns         88.8 ns      7841490

so it seems that constructing the regex fsm from the string is faster in boost, but evaluating the fsm is faster in the stl

aboseley commented 3 years ago

lies, statistics and benchmarks.

switching to libc++ and clang

------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_MatchWithBoost       2902 ns         2900 ns       240455
BM_MatchWithStd          997 ns          996 ns       69198

Honestly the speed of this code isn't that much of issue. I can't image the speed of a bind call being an issue

aboseley commented 3 years ago

If nothing else this PR reduces the binary size again a bit further

aboseley commented 3 years ago

Thank-you for the PR