redboltz / mqtt_cpp

Boost Software License 1.0
425 stars 106 forks source link

Faster UTF8 validation #705

Open jonesmz opened 3 years ago

jonesmz commented 3 years ago

Just making the project aware of this faster algorithm. https://lemire.me/blog/2020/10/20/ridiculously-fast-unicode-utf-8-validation/

Possible ways to take advantage of this are to provide some kind of hook for user code to provide it's own UTF8 validation, or a compile time option to specify a UTF8 validation function as a dependency.

kleunen commented 3 years ago

It seems to use vectorized (SIMD) instructions, i would say it goes a bit to far to have this kind of optimization. The UTF8 validation overhead is only the tiniest percentage of the whole workload. Not sure if optimization of this would give you any noticable performance gain. I wonder why boost locale does not have a validation function and select an optimized version based on CPU architecture.