Open seanm opened 3 years ago
Is this related to these pull requests?
https://github.com/glennrp/libpng/pull/359 https://github.com/glennrp/libpng/pull/354
Any updates on this? I would like to build universal binary for Mac Intel and Silicon my make command is like this ${CMAKE} -DCMAKE_OSX_ARCHITECTURES="x86_64;arm64" ../ -DCMAKE_INSTALL_PREFIX=./install
The problem that was originally reported is now fixed, but macOS universal binaries still cannot be produced.
I'm also running into the issue of not being able to create universal binaries. I was looking into the code and the issue seems to be that all the flags for hardware-specific options are set as compiler flags, which of course means that a single compile-command cannot correctly make a binary for both architectures.
I wonder if there is a specific reason to do it this way? If we instead make a header that determines the architecture (and then sets the necessary defines), we could instead include that. This would mean that when invoking clang for multiple architectures those defines can be different and then it should work fine. We could check for things like __arm__
and __arm64__
.
Is this a left-over from the autotools time? Would a PR to change this have a chance of being accepted?
@omartijn Hello, did you work it around anyhow?
@omartijn Hello, did you work it around anyhow?
No, I was waiting for a reply from a maintainer. I don't want to do the work unless it's going to be used. I'm using libpng from vcpkg
, where I have a workaround to build universal libraries by performing two separate builds and merging them. This, unfortunately, also didn't get merged upstream due to endless code churn (they kept asking me to refactor things after another PR changed things) and it ended up costing too much time.
Ran into this today as well.
cmake .. "-DCMAKE_OSX_ARCHITECTURES=arm64;x86_64"
cmake --build .
but the build fails in the neon code. The 1st error in a long list is:
/Users/justinzaun/Development/3rdparty-repos/libpng/arm/filter_neon.S:41:9: error: unknown directive
.fpu neon
I can build for either architecture but not a universal build for both. Also, I'm on a arm Mac trying to create a universal build.
This is my crapy little hack to make a universal binary.
#! /bin/sh
rm -rf build-x86
rm -rf build-arm
rm -rf build
mkdir build-x86
mkdir build-arm
mkdir build
cd build-x86
cmake .. "-DCMAKE_OSX_ARCHITECTURES=x86_64"
cmake --build .
cd ../build-arm
cmake .. "-DCMAKE_OSX_ARCHITECTURES=arm64"
cmake --build .
cd ../build
cmake .. "-DCMAKE_OSX_ARCHITECTURES=x86_64"
cmake --build .
lipo -create ../build-arm/libpng16.a ../build-x86/libpng16.a -output ./libpng16.a
lipo -create ../build-arm/libpng16.16.41.git.dylib ../build-x86/libpng16.16.41.git.dylib -output ./libpng16.16.41.git.dylib
cd ..
lipo -info ./build/libpng16.a
lipo -info ./build/libpng16.16.41.git.dylib
I'm seeing this when building a universal binary on a m1 mac as well. Is there any known workarounds? Currently just building 2 builds.
If anyone has any insight on where to take a look at the problem with it I would be happy to help.
Either disable the hardware options completely or don't try to build libpng yourself - rely on the Apple provided functionality.
Is there any known workarounds?
My initial bug report literally contains a workaround.
Is there any known workarounds?
My initial bug report literally contains a workaround.
This is a possible xv-backdoor response. @seanm responded within a few minutes of my response and responded off-topic to that response; the @seanm response is discursive and is inviting discursive responses; @seanm is the OP and should have closed this as @ctruta observed that, "The problem that was originally reported is now fixed".
Context (off-immediate-topic): https://www.theregister.com/2024/04/01/xz_backdoor_open_source/
My apologies @seanm , that was brusque and inappropriate particularly as I did exactly the same thing myself.
Cross compilation is not environment specific and not architecture specific; it needs to be supported by the cmake builds and, so far as I can see, cmake has completely adequate mechanisms for doing this. So far as I know at this point cross compilation is completely supported; not just for ARM64 but also the various other instruction sets.
Given that assumption I strongly recommend that you or @ctruta close this issue, it is clearly a bug, and ignore the other issues that were raised. They need to be raised in separate issues. We should all know not to morph bug reports into separate issues; it's done often enough but it's wrong.
Any single "issue" needs to be about one identifiable thing so that the thing in question can be tested and regressed. Cross compilation is one thing and it is not system-specific, multilib and extensions are separate issues that are also not system-specific and so on including system-specific things like support for specific IDEs.
Either disable the hardware options completely or don't try to build libpng yourself - rely on the Apple provided functionality.
I hit this issue recently, but still had the same compiler errors as the OP when setting -DPNG_HARDWARE_OPTIMIZATIONS=OFF
. I eventually discovered that, though CMakeLists.txt
does use TARGET_ARCH
now, it only sets the optimization flags based on the first target architecture. That is:
# build fails
cmake -S . -B build -DCMAKE_OSX_ARCHITECTURES:STRING="x86_64;arm64" -DPNG_HARDWARE_OPTIMIZATIONS=OFF
cmake --build build/
# build succeeds
cmake -S . -B build -DCMAKE_OSX_ARCHITECTURES:STRING="arm64;x86_64" -DPNG_HARDWARE_OPTIMIZATIONS=OFF
cmake --build build/
Notably, building with the optimizations enabled fails for both architecture orderings. With x86_64;arm64
, I get the same missing symbol errors as the OP. With arm64;x86_64
(or adding OR APPLE
as the OP suggests), I get errors like the log at the end of this post. This is despite everything compiling fine when a single architecture is provided.
I don't know enough about the libpng code base to offer solutions to these issues, so I post this here so that it might help someone else in the future. For optimized libraries, it seems an approach like @zaun's solution makes the most sense as a workaround.
-DCMAKE_OSX_ARCHITECTURES:STRING="arm64;x86_64"
FAILED: CMakeFiles/png_shared.dir/arm/filter_neon_intrinsics.c.o
/Library/Developer/CommandLineTools/usr/bin/cc -DPNG_ARM_NEON_OPT=2 -I/Users/foo/libpng/build -I/Users/foo/libpng -isystem /Users/foo/test_prefix/include -O3 -DNDEBUG -arch arm64 -arch x86_64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk -fPIC -MD -MT CMakeFiles/png_shared.dir/arm/filter_neon_intrinsics.c.o -MF CMakeFiles/png_shared.dir/arm/filter_neon_intrinsics.c.o.d -o CMakeFiles/png_shared.dir/arm/filter_neon_intrinsics.c.o -c /Users/foo/libpng/arm/filter_neon_intrinsics.c
In file included from /Users/foo/libpng/arm/filter_neon_intrinsics.c:24:
/Library/Developer/CommandLineTools/usr/lib/clang/15.0.0/include/arm_neon.h:28:2: error: "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
#error "NEON intrinsics not available with the soft-float ABI. Please use -mfloat-abi=softfp or -mfloat-abi=hard"
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:62:17: error: expected ';' after expression
uint8x16_t qrp, qpp;
^
;
/Users/foo/libpng/arm/filter_neon_intrinsics.c:62:7: error: use of undeclared identifier 'uint8x16_t'
uint8x16_t qrp, qpp;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:62:18: error: use of undeclared identifier 'qrp'; did you mean 'rp'?
uint8x16_t qrp, qpp;
^~~
rp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:54:14: note: 'rp' declared here
png_bytep rp = row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:62:23: error: use of undeclared identifier 'qpp'; did you mean 'pp'?
uint8x16_t qrp, qpp;
^~~
pp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:56:20: note: 'pp' declared here
png_const_bytep pp = prev_row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:64:7: error: use of undeclared identifier 'qrp'; did you mean 'rp'?
qrp = vld1q_u8(rp);
^~~
rp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:54:14: note: 'rp' declared here
png_bytep rp = row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:64:13: error: call to undeclared function 'vld1q_u8'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
qrp = vld1q_u8(rp);
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:64:11: error: incompatible integer to pointer conversion assigning to 'png_bytep' (aka 'unsigned char *') from 'int' [-Wint-conversion]
qrp = vld1q_u8(rp);
^ ~~~~~~~~~~~~
/Users/foo/libpng/arm/filter_neon_intrinsics.c:65:7: error: use of undeclared identifier 'qpp'; did you mean 'pp'?
qpp = vld1q_u8(pp);
^~~
pp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:56:20: note: 'pp' declared here
png_const_bytep pp = prev_row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:65:11: error: incompatible integer to pointer conversion assigning to 'png_const_bytep' (aka 'const unsigned char *') from 'int' [-Wint-conversion]
qpp = vld1q_u8(pp);
^ ~~~~~~~~~~~~
/Users/foo/libpng/arm/filter_neon_intrinsics.c:66:7: error: use of undeclared identifier 'qrp'; did you mean 'rp'?
qrp = vaddq_u8(qrp, qpp);
^~~
rp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:54:14: note: 'rp' declared here
png_bytep rp = row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:66:13: error: call to undeclared function 'vaddq_u8'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
qrp = vaddq_u8(qrp, qpp);
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:66:22: error: use of undeclared identifier 'qrp'; did you mean 'rp'?
qrp = vaddq_u8(qrp, qpp);
^~~
rp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:54:14: note: 'rp' declared here
png_bytep rp = row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:66:27: error: use of undeclared identifier 'qpp'; did you mean 'pp'?
qrp = vaddq_u8(qrp, qpp);
^~~
pp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:56:20: note: 'pp' declared here
png_const_bytep pp = prev_row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:67:7: error: call to undeclared function 'vst1q_u8'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
vst1q_u8(rp, qrp);
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:67:20: error: use of undeclared identifier 'qrp'; did you mean 'rp'?
vst1q_u8(rp, qrp);
^~~
rp
/Users/foo/libpng/arm/filter_neon_intrinsics.c:54:14: note: 'rp' declared here
png_bytep rp = row;
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:78:14: error: expected ';' after expression
uint8x16_t vtmp = vld1q_u8(rp);
^
;
/Users/foo/libpng/arm/filter_neon_intrinsics.c:78:4: error: use of undeclared identifier 'uint8x16_t'
uint8x16_t vtmp = vld1q_u8(rp);
^
/Users/foo/libpng/arm/filter_neon_intrinsics.c:78:15: error: use of undeclared identifier 'vtmp'
uint8x16_t vtmp = vld1q_u8(rp);
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Either disable the hardware options completely or don't try to build libpng yourself - rely on the Apple provided functionality.
I hit this issue recently, but still had the same compiler errors as the OP when setting
-DPNG_HARDWARE_OPTIMIZATIONS=OFF
. I eventually discovered that, thoughCMakeLists.txt
does useTARGET_ARCH
now, it only sets the optimization flags based on the first target architecture. That is:
The whole thing is a bug in CMakeLists.txt
.
If PNG_ARM_NEON_OPT
is undefined the code in pngpriv.h
selects the correct option based on the target architecture. The intent is that all the hardware specific stuff gets compiled but all of it except that which fits the target gets compiled to nothing. This was done deliberately to support the Apple Universal Binary (multilib) stuff.
In other words CMakeLists.txt
should not set PNG_ARM_NEON_OPT
by default, exactly the same as configure
. Doing the setting by default breaks Apple builds, as reported; the fix is the doctor's fix, "Don't do that!"
Until it's fixed you can use configure
; that should work and give you the correct hardware optimizations for the target or you can attempt the same fix in cmake
with:
CFLAGS=-UPNG_ARM_NEON_OPT cmake path-to-source
That should work because the -U, which cmake puts in C_FLAGS
, should appear on the command line after the -D which cmake puts in C_DEFINES
. But it's cmake
and I barely understand it.
It is apparently also possible to edit the cmake
cache either by hand or using cmake-gui
, then you can change C_DEFINES
to remove the erroneous -D before running make. I suggest trying that first but really, really, configure
is the only build system which was set up to support Apple so I strongly recommend using that.
So to summarize a possible CMake fix, if PNG_HARDWARE_OPTIMIZATIONS=ON
, all we need to do is add the arm .c
sources to libpng and leave PNG_ARM_NEON_OPT
undefined. If PNG_HARDWARE_OPTIMIZATIONS=OFF
, we explicitly set PNG_ARM_NEON_OPT=0
.
Just opened a draft PR to this effect. Could optionally restore the removed PNG_ARM_NEON
option.
So to summarize a possible CMake fix, i
No. The optimizations are on by default, so "PNG_HARDWARE_OPTIMIZATIONS=ON" should do nothing and all the things that depend on it being "ON" should happen by default. I.e. all the hardware specific files for all architectures need to be compiled in regardless.
The only useful setting for PNG_HARDWARE_OPTIMIZATION is OFF because that does change the behavior from the default and the only thing it should be doing is setting PNG_machine_OPT=0.
If this change causes something to fail then the specific failing hardware stuff should be removed from the build until the maintainer of that hardware fixes it!
Doing this consistently should fix all the obvious bugs in CMakeLists.txt by simply removing the buggy lines! For example the checks against the architecture which use "^" are apparently bogus.
Thanks. I want to make sure I get this right. So
If PNG_HARDWARE_OPTIMIZATION=ON
, then add all the extra architecture specific sources currently gated behind arch checks (i.e. remove the arch checks), otherwise, set PNG_machine_OPT=0
(where machine
is a specific arch) and DON'T add any extra sources.
My question about PNG_ARM_NEON
thing was that it looks like some attempt to allow the user to selectively enable optimizations by architecture. Given the above, that seems only useful when building for multiple arches at the same time, and even then questionable. Then it's safe to ditch that ability in favor of the above?
If
PNG_HARDWARE_OPTIMIZATION=ON
It's more that the files are all compiled in unless PNG_HARDWARE_OPTIMIZATION!=ON
but since it defaults to 'ON' it doesn't matter much.
Note that it appears the intel SSE optimizations have been disabled by default even if the compile time checks pass with configure
but are currently on by default with cmake
. This looks like another bug; the Intel code was broken before but when it was fixed the defaulting in pngpriv.h was not updated. I fixed it (removed the check on PNG_INTEL_SSE
at line 231 of pngpriv.h
) and "make check
" passes.
It doesn't do any harm to add the extra sources when ...OPT=0
is set but that's currently the way it works with configure
- configure
checks for the machine specific hardware option being "!= no".
For Neon arm/foo.* contain two implementations, one assembler (which only works for 32-bit and possibly only some CPUs) and the intrinsics. There's another bug about that; the only case the assembler is required is for older compilers and compilers which don't support the intrinsics (which may be GNU specific.) In several CPU cases there is support for a run-time check or an API check. This stuff is mostly broken; e.g. the ARM API check relied on Linux specific stuff and even then it didn't work on Android!
It should be sufficient in all cases except Loongarch to just build the code relying on the compiler #define checks. Loongarch won't work in a multilib because there are no compiler flags (maybe there are in bespoke Chinese compilers, but not in the US).
It seems to me that most of the stuff in CMakeLists.txt
is fluff. In configure
there are separate --enable-arm-neon (etc) options which people might be using to turn some hardware specific implementations off or maybe even on. In CMakeLists.txt
the only option is PNG_HARDWARE_OPTIMIZATIONS, all the other configure
command line options have been replaced by hardwired variables of the general form of the Makefile.am
variables set by configure
.
I'm guessing cmake
allows these variables to be set interactively somehow (cmake-gui
maybe?) However CMakeLists.txt
itself just uses the erroneous check on TARGET_ARCH.
Mysteriously the 32-bit neon support is hard-wired off. It looks half-implemented.
With the exception of the Loongarch support (which I left as-is) I simply deleted all the TARGET_ARCH MATCHES code from CMakeLists.txt
except for the "set
" which sets the various libpng_arch_sources variables. In the "else
" clause I kept just add_definitions
. A build shows all the CMakeFiles/png_shared.dir/*/*.o
outputs have zero size (well, 80 bytes text with GCC) except for /intel/
(with my fix for pngpriv.h)
I think you need the pngpriv.h patch to test this; with that patch you should see a build with real code in both /intel/
and /arm/
(nothing in the other options given your TARGET_ARCH). This is assuming your targets have SSE and NEON between them.
I'll post the patch here; it's on a different machine :-)
Here's the patch:
It seems to me that most of the stuff in
CMakeLists.txt
is fluff. Inconfigure
there are separate --enable-arm-neon (etc) options which people might be using to turn some hardware specific implementations off or maybe even on. InCMakeLists.txt
the only option is PNG_HARDWARE_OPTIMIZATIONS, all the otherconfigure
command line options have been replaced by hardwired variables of the general form of theMakefile.am
variables set byconfigure
.
Actually, this begs a question: do you want the CMake configuration to more or less match the configure
interface in terms of options or do you want it to be its own thing as it currently seems to be? If the former, I can use that as a target when dealing with the issue of these current optimization options, and then we can consider maybe a CMake overhaul in the future (I have some experience with this type of thing).
I'm guessing
cmake
allows these variables to be set interactively somehow (cmake-gui
maybe?)
ccmake
usually (requires curses).
Mysteriously the 32-bit neon support is hard-wired off. It looks half-implemented.
Can I reference the configure
behavior to implement this correctly?
With the exception of the Loongarch support (which I left as-is) I simply deleted all the TARGET_ARCH MATCHES code from
CMakeLists.txt
except for the "set
" which sets the various libpng_arch_sources variables. In the "else
" clause I kept justadd_definitions
. A build shows all theCMakeFiles/png_shared.dir/*/*.o
outputs have zero size (well, 80 bytes text with GCC) except for/intel/
(with my fix for pngpriv.h)I think you need the pngpriv.h patch to test this; with that patch you should see a build with real code in both
/intel/
and/arm/
(nothing in the other options given your TARGET_ARCH). This is assuming your targets have SSE and NEON between them.
Thanks! I'll review all of this next week.
Actually, this begs a question: do you want the CMake configuration to more or less match the
configure
interface in terms of options or do you want it to be its own thing as it currently seems to be?
What I want is for both configure
and cmake
to offer similar or even identical user-facing options and to produce exactly the same compiled libpng for a given set of options.
I don't know if Cosmin wants this; certainly what he implemented does not do this.
I regard the configure
--enable-cmake
(currently they are #defines in cmake
which is, I guess, sort-of ok.) However Glenn (the previous maintainer) refused point-black to remove them in 1.6. He didn't explicitly justify this but I believe it was because he didn't want system builders to have to change the build options.
configure
--enable-hardware-optimizations is the replacement.
I have a set of changes to both configure.ac
and CMakeLists.txt
which do what I want without actually disabling the --enable-
Mysteriously the 32-bit neon support is hard-wired off. It looks half-implemented.
Can I reference the
configure
behavior to implement this correctly?
I say "yes" with regard to the behavior with no options I'm pretty sure it's correct in a non-multilib build (i.e. a single CPU target). I tested 32-bit extensively a short time ago while looking at the solution to remove filter_neon.S
With regard to --enable-hardware-optimizations I say yes as well and cmake
already seems to have that setting as a cmake
option
The bottom line is that given a similar set of command line options both cmake
and configure
should produce identical compiler command lines.
Unfortunately configure
adds _-DHAVE_CONFIGH which in turn adds the C99 'restrict' keyword where supported. This should be replaced by a check on the newly defined __has_c_attribute
(which is, apparently, defined in GCC for all -std versions.) This is a separate issue and, indeed, the restrict stuff doesn't appear on the actual API; it's internal-only.
I can generate a PR for the configure
AMD64 default-hardware-off if you would like. I thought some more about it and I suspect I might have put those comments in, possibly because of licensing issues; using contrib
code inside the library isn't possible unless it is unambiguously licensed under the libpng license. Otherwise the code would taint the library (to use Linus's term.)
In fact this is the time to start libpng 1.8 on the basis that this is a problem which needs fixing and the fix is likely to be too scary for the Cosmin, given what he did to filter_neon.S
Numbers are cheap and a minimal step up to 1.8 with just the removal of deprecated functionality and fixes to the build systems allow the latter to include significant simplifications which have been pending for 10 years now.
There's a lot of project history here that I don't completely understand, but it seems like this change should be focused just on removing the optimization gates in CMake.
With the exception of the Loongarch support (which I left as-is) I simply deleted all the TARGET_ARCH MATCHES code from
CMakeLists.txt
except for the "set
" which sets the various libpng_arch_sources variables. In the "else
" clause I kept justadd_definitions
. A build shows all theCMakeFiles/png_shared.dir/*/*.o
outputs have zero size (well, 80 bytes text with GCC) except for/intel/
(with my fix for pngpriv.h)
This is essentially what I'd been thinking and ended up doing. Like you, I left loongarch as is, though that could be simplified even further so that inclusion is entirely dependent on COMPILER_SUPPORTS_LSX
.
There's a lot of project history here that I don't completely understand, but it seems like this change should be focused just on removing the optimization gates in CMake.
There's this issue and then there is the PR. I'm in favor of the PR being separated into multiple independent PRs (like the pngpriv.h issue is a separate issue) as this might mean @ctruta would take it in 1.6. Unfortunately that hasn't happened in the past; old, confusing and counter-productive behavior has been retained.
It would be good to know if configure can actually be used to produce a Universal Binary since it is, apparently, doing the correct stuff when called without any of the command line options.
There's this issue and then there is the PR. I'm in favor of the PR being separated into multiple independent PRs (like the pngpriv.h issue is a separate issue) as this might mean @ctruta would take it in 1.6. Unfortunately that hasn't happened in the past; old, confusing and counter-productive behavior has been retained.
(Emphasis mine)
About that: see my most recent note https://github.com/pnggroup/libpng/pull/576#issuecomment-2332223335 about my idea to surprise the world with a brand new libpng-1.8.0.
Unfortunately that hasn't happened in the past; old, confusing and counter-productive behavior has been retained.
Fixed, not: #583 massively simplifies things but I keep finding TODOs I wrote that also need doing :-)
Trying to build current git master (a37d4836519517bdce6cb9d956092321eca3e73b) as a Universal Binary (or even just arm64 only) on a Intel Mac fails to link:
result:
This is because of https://github.com/glennrp/libpng/blob/a37d4836519517bdce6cb9d956092321eca3e73b/CMakeLists.txt#L71 which queries the compiling machine's CPU type. This doesn't work for Universal Binaries, which are basically cross compilation, because the CPU doing the building doesn't match the CPU being built for.
A working hack is to add in
OR APPLE
so that those arm-only .c files are added to the list of files to be compiled. Compiling them on Intel seems harmless because their contents are guarded with#if PNG_ARM_NEON_OPT > 0