serpapi / nokolexbor

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.
182 stars 4 forks source link

Compilation errors with 0.5.2 #12

Closed erickguan closed 5 months ago

erickguan commented 5 months ago

Hi,

Thanks for building the library. When I tried to build Bridgetown to fix an issue https://github.com/bridgetownrb/bridgetown/issues/852, bundle tried to install and compile nokolexbor.

My compiler is XCode's clang on macOS 14.4.1 (arm).

Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I skimmed through the code and the compiler complained about a few type signature problems.

I understand that clang can be more strict in implicit type conversions. Making explicit type conversations would help macOS users a bit more.

Details:

Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/bin/ruby extconf.rb
checking for whether -DLEXBOR_STATIC is accepted as CFLAGS... yes
checking for whether -DLIBXML_STATIC is accepted as CFLAGS... yes
checking for gmake... no
checking for make... yes
checking for cmake... yes
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

-- Project name: lexbor
-- Build without Threads
-- Lexbor version: 2.1.0
-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Append module: core (1.5.0)
-- Append module: css (0.3.0)
-- Append module: dom (1.4.0)
-- Append module: html (2.2.0)
-- Append module: ns (1.2.0)
-- Append module: selectors (0.1.0)
-- Append module: tag (1.2.0)
-- Append module: utils (0.3.0)
-- CFLAGS:  -O2 -Wall -pedantic -pipe -std=c99 -fPIC
-- CXXFLAGS:  -O2
-- Feature ASAN: disable
-- Feature Fuzzer: enabled
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/vendor/lexbor/build
-- /usr/bin/make install
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.

-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR
-- Performing Test HAVE_ATTRIBUTE_DESTRUCTOR - Success
-- Looking for include file inttypes.h
-- Looking for include file inttypes.h - found
-- Looking for rand_r
-- Looking for rand_r - found
-- Looking for include file stdint.h
-- Looking for include file stdint.h - found
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Configuring done (0.7s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/build
checking for -llexbor_static... yes
checking for lexbor/html/html.h... yes
creating Makefile

current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240420-75863-8t3ezg sitelibdir\=./.gem.20240420-75863-8t3ezg clean

current directory: /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor
make DESTDIR\= sitearchdir\=./.gem.20240420-75863-8t3ezg sitelibdir\=./.gem.20240420-75863-8t3ezg
compiling nl_attribute.c
nl_attribute.c:62:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *name = lxb_dom_attr_qualified_name(attr, &len);
              ^      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:64:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(name, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:102:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *value = lxb_dom_attr_value(attr, &len);
              ^       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:104:10: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return rb_utf8_str_new(value, len);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_attribute.c:144:28: warning: incompatible pointer types passing 'lxb_dom_element_t *' (aka 'struct lxb_dom_element *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node
*') [-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->owner, nl_rb_document_get(self));
                           ^~~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:161:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *')
[-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->prev, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:178:28: warning: incompatible pointer types passing 'lxb_dom_attr_t *' (aka 'struct lxb_dom_attr *') to parameter of type 'lxb_dom_node_t *' (aka 'struct lxb_dom_node *')
[-Wincompatible-pointer-types]
  return nl_rb_node_create(attr->next, nl_rb_document_get(self));
                           ^~~~~~~~~~
./nokolexbor.h:26:41: note: passing argument to parameter 'node' here
VALUE nl_rb_node_create(lxb_dom_node_t *node, VALUE rb_document);
                                        ^
nl_attribute.c:188:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *attr_value = lxb_dom_attr_value(attr, &len);
              ^            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_attribute.c:192:40: warning: pointer type mismatch ('const char *' and 'lxb_char_t *' (aka 'unsigned char *')) [-Wpointer-type-mismatch]
                    attr_value == NULL ? "" : attr_value);
                                       ^ ~~   ~~~~~~~~~~
nl_attribute.c:192:21: warning: format specifies type 'char *' but the argument has type 'void *' [-Wformat]
                    attr_value == NULL ? "" : attr_value);
                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10 warnings generated.
compiling nl_cdata.c
compiling nl_comment.c
compiling nl_document.c
nl_document.c:23:9: error: incompatible function pointer types initializing 'RUBY_DATA_FUNC' (aka 'void (*)(void *)') with an expression of type 'void (lxb_html_document_t *)' (aka 'void
(struct lxb_html_document *)') [-Wincompatible-function-pointer-types]
        free_nl_document,
        ^~~~~~~~~~~~~~~~
nl_document.c:107:45: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct
lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
                                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:100:46:
note: passing argument to parameter 'document' here
lxb_html_document_title(lxb_html_document_t *document, size_t *len);
                                             ^
nl_document.c:107:15: warning: initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'const lxb_char_t *' (aka 'const unsigned char *') discards qualifiers
[-Wincompatible-pointer-types-discards-qualifiers]
  lxb_char_t *str = lxb_html_document_title(nl_rb_document_unwrap(self), &len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_document.c:108:44: warning: passing 'lxb_char_t *' (aka 'unsigned char *') to parameter of type 'const char *' converts between pointers to integer types where one is of the unique
plain 'char' type and the other is not [-Wpointer-sign]
  return str == NULL ? rb_str_new("", 0) : rb_utf8_str_new(str, len);
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/include/ruby-3.2.0/ruby/internal/intern/string.h:1553:25: note: expanded from macro 'rb_utf8_str_new'
      rb_utf8_str_new) ((str), (len)))
                        ^~~~~
nl_document.c:129:49: warning: incompatible pointer types passing 'lxb_dom_document_t *' (aka 'struct lxb_dom_document *') to parameter of type 'lxb_html_document_t *' (aka 'struct
lxb_html_document *') [-Wincompatible-pointer-types]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2/ext/nokolexbor/../../vendor/lexbor/dist/include/lexbor/html/interfaces/document.h:103:50:
note: passing argument to parameter 'document' here
lxb_html_document_title_set(lxb_html_document_t *document,
                                                 ^
nl_document.c:129:15: error: incompatible integer to pointer conversion initializing 'lxb_char_t *' (aka 'unsigned char *') with an expression of type 'lxb_status_t' (aka 'unsigned int')
[-Wint-conversion]
  lxb_char_t *str = lxb_html_document_title_set(nl_rb_document_unwrap(self), (const lxb_char_t *)c_title, len);
              ^     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nl_document.c:129:15: warning: unused variable 'str' [-Wunused-variable]
5 warnings and 2 errors generated.
make: *** [nl_document.o] Error 1

make failed, exit code 2

Gem files will remain installed in /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/nokolexbor-0.5.2 for inspection.
Results logged to /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/extensions/arm64-darwin-23/3.2.0/nokolexbor-0.5.2/gem_make.out

  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:119:in `run'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:53:in `block in make'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:45:in `each'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:45:in `make'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/ext_conf_builder.rb:42:in `build'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:187:in `build_extension'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:221:in `block in build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:218:in `each'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/ext/builder.rb:218:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/3.2.0/rubygems/installer.rb:846:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/rubygems_gem_installer.rb:76:in `build_extensions'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/rubygems_gem_installer.rb:28:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/source/rubygems.rb:205:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/gem_installer.rb:54:in `install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/gem_installer.rb:16:in `install_from_spec'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/parallel_installer.rb:132:in `do_install'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/installer/parallel_installer.rb:123:in `block in worker_pool'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:62:in `apply_func'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:57:in `block in process_queue'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:54:in `loop'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:54:in `process_queue'
  /Users/erickg/.local/share/mise/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bundler-2.5.3/lib/bundler/worker.rb:90:in `block (2 levels) in create_threads'
zyc9012 commented 5 months ago

Hi @erickguan, thanks for reporting the issue. Nokolexbor ships with prebuilt binaries. Your bundler should have picked this one, I wonder why it chose to compile from the source.

Meanwhile, I'll look into the compiler errors.

erickguan commented 5 months ago

macOS arm devices usually marks multiple platforms. I'm not 100% sure but arm instruction sets might have differences.

Here is what Bridgetown declares:

https://github.com/bridgetownrb/bridgetown/blob/main/Gemfile.lock#L226-L230

Happy to help testing the compilation and configuration.

zyc9012 commented 5 months ago

@erickguan I pushed a fix. Can you clone this repo, bundle install and then bundle exec rake compile to see if it has any errors?

erickguan commented 5 months ago

@zyc9012 Everything works. I don't see any complication errors.

zyc9012 commented 5 months ago

Released 0.5.4