ruby / zlib

Ruby interface for the zlib compression/decompression library
Other
49 stars 35 forks source link

Investigate failure on truffleruby-head + macOS + XCode 14.2 #75

Closed eregon closed 7 months ago

eregon commented 8 months ago

From https://github.com/ruby/zlib/pull/73#issuecomment-1890427228

Run bundle exec rake compile test
mkdir -p lib
mkdir -p tmp/x86_64-darwin20/zlib/3.2.2
/Users/runner/.rubies/truffleruby-head/bin/ruby -I. ../../../../ext/zlib/extconf.rb
cd tmp/x86_64-darwin20/zlib/3.2.2
checking for deflateReset(NULL) in -lz... yes
checking for crc32_combine() in zlib.h... yes
checking for adler32_combine() in zlib.h... yes
checking for z_crc_t in zlib.h... yes
checking for z_size_t in zlib.h... yes
checking for crc32_z() in zlib.h... yes
checking for adler32_z() in zlib.h... yes
creating Makefile
cd -
cd tmp/x86_64-darwin20/zlib/3.2.2
/usr/bin/make
compiling ../../../../ext/zlib/zlib.c
linking shared-object zlib.bundle
ld: warning: -undefined dynamic_lookup may not work with chained fixups
cd -
/usr/bin/make install sitearchdir=../../../../lib sitelibdir=../../../../lib target_prefix=
mkdir -p tmp/x86_64-darwin20/stage/lib
/usr/bin/install -c -m 0755 zlib.bundle ../../../../lib
cp tmp/x86_64-darwin20/zlib/3.2.2/zlib.bundle tmp/x86_64-darwin20/stage/lib/zlib.bundle
<internal:core> core/kernel.rb:234:in `gem_original_require': dlopen(/Users/runner/work/zlib/zlib/lib/zlib.bundle, 0x0009): symbol not found in flat namespace (_rb_econv_check_error) (RuntimeError)
    from <internal:/Users/runner/.rubies/truffleruby-head/lib/mri/rubygems/core_ext/kernel_require.rb>:37:in `require'
    from /Users/runner/work/zlib/zlib/test/zlib/test_zlib.rb:10:in `<top (required)>'
    from <internal:core> core/kernel.rb:234:in `gem_original_require'
    from <internal:/Users/runner/.rubies/truffleruby-head/lib/mri/rubygems/core_ext/kernel_require.rb>:37:in `require'
    from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:21:in `block in <main>'
    from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `select'
    from /Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `<main>'
rake aborted!
Command failed with status (1)
/Users/runner/work/zlib/zlib/rakefile:10:in `block in <top (required)>'
/Users/runner/work/zlib/zlib/vendor/bundle/truffleruby/3.2.2.9/gems/rake-13.1.0/exe/rake:27:in `<top (required)>'
<internal:core> core/kernel.rb:383:in `load'
<internal:core> core/kernel.rb:383:in `load'
<internal:core> core/kernel.rb:383:in `load'
/Users/runner/.rubies/truffleruby-head/bin/bundle:44:in `<main>'
Tasks: TOP => test_internal
(See full trace by running task with --trace)
eregon commented 8 months ago

Mmh moving it to another function does not seem to help: https://github.com/ruby/zlib/actions/runs/7512241958/job/20452716267?pr=75 Although maybe the issue is clang inlines it and then it has no effect.

Looking at dlopen man pages for RTLD_LAZY there are some differences which could be what I guessed above, but not sure: macOS: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dlopen.3.html

     RTLD_LAZY   Each external function reference is bound the first time the
                 function is called.

Though that does sound like it should only resolve it when the external function is called, not when the caller function is called, so probably my guess is wrong.

Linux (man dlopen):

       RTLD_LAZY
              Perform  lazy binding.  Resolve symbols only as the code that references them is executed.  If the symbol is
              never referenced, then it is never resolved.  (Lazy binding is performed only for function references;  ref‐
              erences  to  variables  are  always immediately bound when the shared object is loaded.)  Since glibc 2.1.1,
              this flag is overridden by the effect of the LD_BIND_NOW environment variable.

My next guess is tests do something different on macOS and do trigger that dummy encoding path (which uses rb_econv_check_error) while they don't on Linux.

eregon commented 8 months ago

https://github.com/ruby/zlib/actions/runs/7512241958/job/20452716267?pr=75#step:4:32 So it fails on the require 'zlib' line. That's weird, it's as if it was using RTLD_NOW instead of RTLD_LAZY. I'll keep investigating next week.

eregon commented 8 months ago

dlopen(/Users/runner/work/zlib/zlib/lib/zlib.bundle, 0x0009): symbol not found in flat namespace (_rb_econv_check_error) So I should check what the 0x0009 means in terms of flags.

eregon commented 8 months ago

If https://opensource.apple.com/source/dyld/dyld-239.3/include/dlfcn.h.auto.html is correct (but is it?) Then 9 is 8 (RTLD_GLOBAL) + 1 (RTLD_LAZY), which are the expected flags we set in TruffleRuby. But then it behaves as non-lazy, so weird. Maybe the Init_zlib is already eagerly loading these symbols because it uses rb_gzreader_getc and that would inline everything? It doesn't seem likely though. Weird indeed.

eregon commented 8 months ago

ld: warning: -undefined dynamic_lookup may not work with chained fixups Maybe that's the issue and causes to not resolve symbols lazily?

eregon commented 8 months ago

ld: warning: -undefined dynamic_lookup may not work with chained fixups Maybe that's the issue and causes to not resolve symbols lazily?

Yeah that seems to be it. It works fine on macOS 11/XCode 13.2: https://github.com/ruby/zlib/actions/runs/7521661799/job/20472764087?pr=75 And it fails on macOS 12/XCode 14.2: https://github.com/ruby/zlib/actions/runs/7521608329/job/20472641124?pr=75 It works fine on macOS 13/XCode 15.1: https://github.com/ruby/zlib/actions/runs/7532280338/job/20502568502?pr=75

So it seems the same issue that CRuby had for XCode 14 in:

Also it might be fixed in XCode 14.3: https://github.com/python/cpython/issues/97524#issuecomment-1458855301 But the macos-latest/macos-12 image uses 14.2 :/

eregon commented 8 months ago

For now let's use macos 13 so we don't have the problematic XCode 14.2: https://github.com/ruby/zlib/pull/76

eregon commented 7 months ago

macos-12 (same as macos-latest) + MACOSX_DEPLOYMENT_TARGET=11.0 works as well: https://github.com/eregon/zlib/actions/runs/7543545007/job/20534822113 That seems a better fix in general than having to test on macos != 12. We should only do it for XCode 14.2 though.