serpapi / nokolexbor

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.
218 stars 4 forks source link

Ship pre-compiled `lexbor` binaries #5

Closed ilyazub closed 1 year ago

ilyazub commented 1 year ago

Nokogiri is shipped with the pre-compiled binaries. While lexbor is compiled pretty fast (< 25 seconds), it can be even more convenient if cmake won't be required and gem installation time would be faster.

Here are rake tasks that are used to pre-compile libxml2 in Nokogiri: https://github.com/sparklemotion/nokogiri/blob/88b3e2feefb51ab79b7586b1147a8b3c2d5abcda/rakelib/extensions.rake#L322-L341

Thanks for your work, @zyc9012 :+1:

zyc9012 commented 1 year ago

Yeah, I have the plan for it.

A concern of mine was, if shipped with pre-compiled binaries, will we lose the opportunity to optimize for latest CPU features (such as avx256), because we have to target older CPUs?

ilyazub commented 1 year ago

Does lexbor or nokolexbor use SIMD instructions? How different is the Nokolexbor performance when compiled with and without the latest CPU features?

In any case, we can self-host a CI runners with the modern CPUs to compile lexbor and libxml2. (Sounds like an over-engineering.)

zyc9012 commented 1 year ago

Tried with -O3 -march=native but resulted in no difference. 😞

zyc9012 commented 1 year ago

@ilyazub It's done.

See https://github.com/serpapi/nokolexbor/actions/runs/3810516328 and https://rubygems.org/gems/nokolexbor/versions

ilyazub commented 1 year ago

@zyc9012 Thank you!