libsdl-org / SDL_image

Image decoding for many popular formats for Simple Directmedia Layer.
zlib License
556 stars 182 forks source link

jpeg-xl test have been failing on macos #450

Open sezero opened 5 months ago

sezero commented 5 months ago

For some time, jpeg-xl our test have been failing on macos, as can be seen in the CI logs.

The CI logs say that brew is installing version 0.10.2 of libjxl. None of the other runners use v0.10.x at the moment, therefore is it possible that SDL_image has an issue with libjxl-0.10.x ?

madebr commented 5 months ago

The ci failure mode(s) have also been changed since GitHub macos runners switched to arm64: Before, there were errors about surface mismatches. Now, IMG_Init(IMG_INIT_JXL) fails because it cannot find libjxl.0.10.dylib. There is also an issue with libwebpdemux.2.dylib

sezero commented 5 months ago

Now, IMG_Init(IMG_INIT_JXL) fails because it cannot find libjxl.0.10.dylib. There is also an issue with libwebpdemux.2.dylib

Ouch. How is that happening? Something wrong with brew (e.g. not adding to dyld cache or something)?

madebr commented 5 months ago

Ouch. How is that happening? Something wrong with brew (e.g. not adding to dyld cache or something)?

I don't know how macOS dyld works but it seems like no homebrew library can be loaded at all. All libraries provided by homebrew cannot be found: libavif, libjxl, and libwebp. (I removed installation of libjpeg, libpng, and libtiff)

The macOS job now uploads CMake logs that hint homebrew installs to /opt/homebrew (e.g. the CMake cache contains libjxl_LIBRARY:FILEPATH=/opt/homebrew/lib/libjxl.dylib). But the IMG_Init error message is:

Initialization should succeed (Failed loading libjxl.0.10.dylib: dlopen(libjxl.0.10.dylib, 0x0006): tried: 'libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OSlibjxl.0.10.dylib' (no such file), '/Users/runner/work/SDL_image/SDL_image/build/libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Users/runner/work/SDL_image/SDL_image/build/libjxl.0.10.dylib' (no such file), '/var/folders/3m/p59k4qdj0f17st0gn2cmj3640000gn/T/setupsdl/66c57facf1111a4ec08d3d1abdf3c87f3062ffbea8d9eb4c9c412e9b5e7f59b7/package/lib/libjxl.0.10.dylib' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/var/folders/3m/p59k4qdj0f17st0gn2cmj3640000gn/T/setupsdl/66c57facf1111a4ec08d3d1abdf3c87f3062ffbea8d9eb4c9c412e9b5e7f59b7/package/lib/libjxl.0.10.dylib' (no such file), '/usr/lib/libjxl.0.10.dylib' (no such file, not in dyld cache), 'libjxl.0.10.dylib' (no such file))

There is no /opt/homebrew/lib path in the error message.

sezero commented 5 months ago

Looks like macos >= 13 has the issue. Changing the runner to macos-12, we get the old surface mismatch

There is no /opt/homebrew/lib path in the error message.

I guess the path is on macos libs path, because cmake can found them, yes? Is it possible that macos-13 (arm64?) versions somehow quarantine those libs and dlopen fails because of it?

madebr commented 5 months ago

Looks like CMake hardcodes /opt/homebrew. This is also the default homebrew installation path. This homebrew discussion is related. The shellenv suggestion might work for our purposes.

sezero commented 5 months ago

Yes, looks like we'll need to add homebrew lib directory to dyld path somehow in our workflows

madebr commented 5 months ago

When I add the homebrew library path to DYLD_LIBRARY_PATH, the test fails with a BUS error. https://github.com/madebr/SDL_image/actions/runs/8927177439/job/24519945832#step:13:152

It fails during the BMP test, after successfully completing the avif test. Or at least, that is what it appears like because the logs might be incomplete. When doing a search for "bus error DYLD_LIBRARY_PATH`, it looks like these errors are not uncommon.

Adding /opt/homebrew/lib to SDL_image's rpath won't fix the issue: it must be added to SDL3 (dlopen happens there)

sezero commented 5 months ago

When I add the homebrew library path to DYLD_LIBRARY_PATH, the test fails with a BUS error. https://github.com/madebr/SDL_image/actions/runs/8927177439/job/24519945832#step:13:152

It fails during the BMP test, after successfully completing the avif test. Or at least, that is what it appears like because the logs might be incomplete. When doing a search for "bus error DYLD_LIBRARY_PATH`, it looks like these errors are not uncommon.

Misalinged stack or something? I wonder whether or not it happens in SDL2 too.

sezero commented 4 months ago

Looks like msys started installing libjxl 0.10.2 and our tests started failing there too: https://github.com/libsdl-org/SDL_image/actions/runs/9005679096/job/24741542312

sezero commented 2 months ago

Can we not raise this issue in libjxl bug tracker somehow?