grimbough / Rhdf5lib

Distribution of the HDF5 library in an R package
https://bioconductor.org/packages/Rhdf5lib/
6 stars 14 forks source link

Linker error (Mac with M3 processor) #58

Open timothy-barry opened 11 months ago

timothy-barry commented 11 months ago

Hello,

Thanks for this helpful package. I've found it very useful over the past couple years.

I recently got a Mac with an M3 processor (previously, I was using a Mac with an i5 processor). I am having some difficulty compiling my package (ondisc), which depends on and links to Rhdf5lib.

Some background: I downloaded and installed R-4.3.2-arm64.pkg (i.e., the version of R compatible with Mac M processors). Next, I downloaded and installed the Rhdf5lib macOS Binary for arm64 (the architecture used by Mac M processors) from here. I am able to call the functions contained within Rhdf5lib without issue. For example, calling Rhdf5lib::getHdf5Version() returns "1.10.7", and calling Rhdf5lib::pkgconfig() returns

"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5_cpp.a" "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5.a" -L"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib" -lcrypto -lcurl -lsz -laec -lz -ldl -lm 

The directory /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/ indeed exists and contains libhdf5_cpp.a, which seems to be a good starting point.

However, when I try to install my package ondisc --- which depends upon and links to Rhdf5lib --- I get the following error.

clang++ -arch arm64 -std=gnu++17 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o ondisc.so RcppExports.o h5_import_data_functs.o h5_initialize_functs.o h5_load_row.o h5_simple_read.o shared_functs.o utility_functs.o /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5_cpp.a /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5.a -L/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib -lcrypto -lcurl -lsz -laec -lz -ldl -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation

ld: library 'crypto' not found
clang: error: linker command failed with exit code 1 (use -v to see invocation)

It looks like the linker is attempting to link against libcrpyto (which is part of the openSSL toolkit). My installation of libcrpyto does not appear to be on the search path, causing the error.

As a final piece of information, when I downloaded and installed R-4.3.2-x86_64.pkg (i.e., the version of R compatible with Intel Macs), I was able to install ondisc without issue. I was happy to see this. However, I would strongly prefer not to use the Intel processor version of R, as the intel processor version of R is much slower than the M processor version of R for another package that I maintain.

Any help would be greatly appreciated. Might a solution be to bundle the other C++ libraries (including libcrpyto) inside the Rhdf5lib package to avert this sort of problem?

PeteHaitch commented 11 months ago

Not certain, but I think following the advice from https://mac.r-project.org/bin/ will install the appropriate openssl binary, which provides libcrypto. I suggest reading the link before copy+pasting the below code:

source("https://mac.R-project.org/bin/install.R")
install.libs("openssl")
timothy-barry commented 11 months ago

Hi Pete,

Thanks, this seems to have worked. Great solution!

I wonder if the libraries that rhdf5lib links to (e.g., libcrypto) are supposed to be downloaded into /opt/R/arm64/ when one downloads R? If not, might it be helpful for @grimbough to make a note about installing these dependencies separately?

grimbough commented 11 months ago

Thanks for the reports. I'm surprised this hasn't come up before, but maybe there aren't too many people writing packages that link to Rhdf5lib, and the few that do have fortunately had the appropriate libraries in the search path by happenstance.

My first instinct is to say that you just don't need the -lcrypto -lcurl statements in your linking line provided by Rhdf5lib::pkgconfig(), unless you're using the S3 reading functionality. However I suspect (but don't know) that you'll then just get a different error complaining that there are "unresolved symbols" in the libhdf5.a. That probably depends on whether the binary shipped by Bioconductor links to static versions of libcrypto or not.

I vaguely recall trying to force this static linking a few years ago, but don't think it went well. That was also before we had an arm64 builder, where I'm sure the setup is different anyway.

Maybe @jwokaty or @hpages have some thoughts on this?

If not I can certainly document the need to have those extra libraries installed, but it will be a pain to point that out to all your users. The configure script is normally sufficient to detect whether those libraries are installed and set things appropriately for that system, but this doesn't work in the case where you installing a binary built on a system with more capabilities than the machine you're installing it on.

timothy-barry commented 11 months ago

Hi Mike,

Thanks for the explanation. I have a few (probably very simple) questions. Note that I've read and am familiar with this document and the manual.)

  1. Doesn't one have to link to Rhdf5lib to use the HDF5 C++ API contained within Rhdf5lib in one's package? If so, isn't more or less everyone using Rhdf5lib also linking to Rhdf5lib? (I am talking about package developers here rather than regular users.)
  2. How would I exclude -lcrypto -lcurl from the link command? Would this entail updating the Makevars file for my package? Or would this entail passing a custom argument to configure.args within the BiocManager::install() function?
  3. Is there a way for me to build the Apple Silicon version of Rhdf5lib from source? Or do I need to download and install the precompiled binary?

At some point (once my understanding of the situation is better) I can try to come up with a set of instructions for installing Rhdf5lib for Apple Silicon users. This set of instructions potentially could be useful for users of the Rhdf5lib package as well as users of the packages that I maintain.

hpages commented 11 months ago

@grimbough It's not clear to me why the output of Rhdf5lib::pkgconfig() contains -lcrypto in the case of @timothy-barry if he didn't have the openssl library.

My understanding is that this library is optional in order to install Rhdf5lib from source. If one doesn't have it, Rhdf5lib's configure script will just display:

...
checking for openssl/evp.h... no
checking for openssl/hmac.h... no
checking for openssl/sha.h... no
S3_VFD=--enable-ros3-vfd=no
configure: creating ./config.status
config.status: creating src/Makevars
...

and it will keep going. You can actually see this on our Intel Mac builder lconway here (note that this is a temporary situation that will hopefully be remedied soon).

Then, the output of Rhdf5lib::pkgconfig() will NOT contain -lcrypto. On lconway:

> Rhdf5lib::pkgconfig()
"/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library/Rhdf5lib/lib/libhdf5_cpp.a" "/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library/Rhdf5lib/lib/libhdf5.a" -L"/Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library/Rhdf5lib/lib" -lsz -laec -lz -ldl -lm

Now, if one does have the openssl library, then Rhdf5lib's configure script should display:

...
checking for openssl/evp.h... yes
checking for openssl/hmac.h... yes
checking for openssl/sha.h... yes
checking for curl_global_init in -lcurl... yes
checking for EVP_sha256 in -lcrypto... yes
S3_VFD=--enable-ros3-vfd
configure: creating ./config.status
config.status: creating src/Makevars
...

as you can see our arm64 Mac builder kjohnson1here.

And in this case the output of Rhdf5lib::pkgconfig() should contain -lcrypto. On kjohnson1:

> Rhdf5lib::pkgconfig()
"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5_cpp.a" "/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5.a" -L"/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib" -lcrypto -lcurl -lsz -laec -lz -ldl -lm

So what's puzzling me is that @timothy-barry had -lcrypto in Rhdf5lib::pkgconfig(). So I assume that Rhdf5lib's configure was able to find the libcrypto library and perform a small compilation/linking test against it (according to the checking for EVP_sha256 in -lcrypto... yes line). Just my assumption, we didn't see the output Rhdf5lib source installation on this machine. But then, for some reason, when he tried to install his own package, ld was not able to find the libcrypto library, even though it seems that it was able to do this earlier.

Could it be that you already had openssl installed via Homebrew on your machine @timothy-barry? Unfortunately, Homebrew installations tend to cause all sorts of problems for R and R packages. Or that you changed something on your machine between the time you installed Rhdf5lib from source and the time you tried to install your own package? Oh wait, now that I'm typing this, I'm thinking... maybe you installed the Rhdf5lib binary that we make? This is most likely what happened. In this case the output of Rhdf5lib::pkgconfig() reflects what was found on kjohnson1 and not what's on your machine.

H.

PeteHaitch commented 11 months ago

Is there a way for me to build the Apple Silicon version of Rhdf5lib from source? Or do I need to download and install the precompiled binary?

BiocManager::install("Rhdf5lib", type = "source") will install from source rather installing than the precompiled binary.

grimbough commented 11 months ago

Hi Mike,

Thanks for the explanation. I have a few (probably very simple) questions. Note that I've read and am familiar with this document and the manual.)

Great that you've read the documentation, I'm glad it's useful to someone! Happy to answer questions.

  1. Doesn't one have to link to Rhdf5lib to use the HDF5 C++ API contained within Rhdf5lib in one's package? If so, isn't more or less everyone using Rhdf5lib also linking to Rhdf5lib? (I am talking about package developers here rather than regular users.)

Yes, but there's not actually that many packages that do link against Rhdf5lib. If you look at the landing page there's 14. However 2 of those are mine, and several of the others are also have the same maintainers, so the pool of people who are developing packages that link against Rhdf5lib AND are using arm64 Macs for the development work AND don't have openssl in the search path is probably very small. Possibly you're the first!?

I try to use GitHub actions and the Bioconductor Build System to test my packages on Mac OSX, but there's no arm64 GHA runners yet and the BBS has the openssl libraries available, so I've never triggered this issue with my own packages.

  1. How would I exclude -lcrypto -lcurl from the link command? Would this entail updating the Makevars file for my package? Or would this entail passing a custom argument to configure.args within the BiocManager::install() function?

At the moment this would take some wrangling on your part. I'd suggest using something like sed to remove the -lcrypto -lcurl part from the output of Rhdf5lib::pkgconfig() in your Makevars. However it's probably easiest just to start by hard coding the modified output into your Makevars and see if it compiles locally. There's no point developing a neat solution if this doesn't actually fix the problem. I'm also happy to add an argument to pkgconfig() if this turns out to be a working solution. I guess to really test it you'll need to remove the libcrypto you installed earlier.

  1. Is there a way for me to build the Apple Silicon version of Rhdf5lib from source? Or do I need to download and install the precompiled binary?

As @PeteHaitch says, try adding the type = "source" argument e.g.

BiocManager::install("Rhdf5lib", type = "source")

At some point (once my understanding of the situation is better) I can try to come up with a set of instructions for installing Rhdf5lib for Apple Silicon users. This set of instructions potentially could be useful for users of the Rhdf5lib package as well as users of the packages that I maintain.

grimbough commented 11 months ago

Hi @hpages

Oh wait, now that I'm typing this, I'm thinking... maybe you installed the Rhdf5lib binary that we make? This is most likely what happened. In this case the output of Rhdf5lib::pkgconfig() reflects what was found on kjohnson1 and not what's on your machine.

Yes, @timothy-barry said "I downloaded and installed the Rhdf5lib macOS Binary for arm64", so I think it's exactly that.

It must be quite a rare situation for someone to install the binary Rhdf5lib, not have those libraries available themselves, and then build another package from source that links against it. Most users would either install everything from source so it's system specific compilation as you explained, or everything binary and then they never hit the linking error because there's no compiling done. Those in the second case might suffer if they actually try to use one of the functions that uses openssl, but I don't get the impression the S3 reading is widely used and that's the only place it's needed.

I think if the builder only provided static libraries for Rhdf5lib to link against, then the appropriate code would be included in the binary. However I vaguely remember trying to do this several years ago, and it seems clear it was not successful since we're still in this situation. Perhaps we should try again?

hpages commented 11 months ago

I think if the builder only provided static libraries for Rhdf5lib to link against, then the appropriate code would be included in the binary. However I vaguely remember trying to do this several years ago, and it seems clear it was not successful since we're still in this situation. Perhaps we should try again?

Well, actually, now that you mention it, the Mac builders are supposed to only provide static libraries. And, as a matter of fact, all the binaries provided by Simon Urbanek at https://mac.r-project.org/bin/ only contain static libraries. For example:

kjohnson1:Downloads biocbuild$ tar ztf openssl-1.1.1t-darwin.20-arm64.tar.xz | grep dylib  # no output

kjohnson1:Downloads biocbuild$ tar ztf openssl-1.1.1t-darwin.20-arm64.tar.xz | grep '\.a'
-rw-r--r--  0 root   admin 3814832 Apr 15  2023 opt/R/arm64/lib/libcrypto.a
-rw-r--r--  0 root   admin  793688 Apr 15  2023 opt/R/arm64/lib/libssl.a

A central idea to the business of building and distributing Mac binaries for CRAN and Bioconductor packages is that these binaries should work on a pristine Mac where those optional system libraries are not necessarily available.

So I went on kjohnson1 and took a look at the linking situation for Rhdf5lib and rhdf5:

kjohnson1:library biocbuild$ pwd
/Library/Frameworks/R.framework/Resources/library

kjohnson1:library biocbuild$ otool -L Rhdf5lib/libs/Rhdf5lib.so 
Rhdf5lib/libs/Rhdf5lib.so:
    Rhdf5lib.so (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libR.dylib (compatibility version 4.3.0, current version 4.3.2)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1775.118.101)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

kjohnson1:library biocbuild$ otool -L rhdf5/libs/rhdf5.so 
rhdf5/libs/rhdf5.so:
    rhdf5.so (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)
    /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libR.dylib (compatibility version 4.3.0, current version 4.3.2)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1775.118.101)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)

They're only linked to system libraries that are guaranteed to be on the user machine. In particular, they're not linked to libcrypto.dylib. So all seems fine.

But this brings other questions.

  1. Why is the linking command for rhdf5 succesful on kjohnson1, despite containing -lcrypto? This command is:

    clang++ -arch arm64 -std=gnu++17 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -L/Library/Frameworks/R.framework/Resources/lib -L/opt/R/arm64/lib -o rhdf5.so H5.o H5A.o H5D.o H5E.o H5F.o H5G.o H5I.o H5L.o H5O.o H5P.o H5R.o H5S.o H5S_extras.o H5T.o H5T_extras.o H5Z.o H5constants.o HandleList.o HandleListWrap.o bit64conversion.o external_filters.o h5dump.o h5ls.o h5testLock.o h5writeDataFrame.o printdatatype.o utils.o wrap.o /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5.a -L/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib -lcrypto -lcurl -lsz -laec -lz -ldl -lm -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation

    The answer is that, unfortunately, kjohnson1 seems to have an homebrewed openssl in addition to Simon Urbankek's openssl binary. The homebrewed openssl provides libcrypto.dylib:

    kjohnson1:~ biocbuild$ ls /opt/homebrew/Cellar/openssl\@1.1/1.1.1t/lib/
    engines-1.1     libcrypto.a     libssl.1.1.dylib    libssl.dylib
    libcrypto.1.1.dylib libcrypto.dylib     libssl.a        pkgconfig

    This dynamic libcrypto is apparently found by Rhdflib's configure script and by the linking command for rhdf5.

    @jwokaty We should try to get rid of the homebrewed openssl on the Mac builders. I have a feeling that it landed there as a dependency of other homebrew installations, in which case it won't be straightforward to get rid of it. If we can't, then we'll need to find a way to "hide" it from the linker. I'll open an issue in BBS repo for that.

  2. How is it that otool -L rhdf5/libs/rhdf5.so does not report libcrypto.dylib despite the linking command for rhdf5.so containing -lcrypto? Could it be that after resolving symbols, the linker decided to not link rhdf5.so to libcrypto.dylib because rhdf5.so actually does not need any symbol from libcrypto.dylib?

H.

grimbough commented 11 months ago

Thanks for the extra details @hpages

Just to get this straight in my head, am I right in thinking that kjohnson1 had both the static openssl from Simon Urbanek and the dynlib hombrew version installed?

If that's the case, could the reason that otool -L rhdf5/libs/rhdf5.so does not report libcrypto.dylib be because it actually successfully linked against the static version?

I get the following if I use nm on the rhdf5.so built and installed on my linux machine:

-> % nm -l rhdf5/libs/rhdf5.so | grep EVP
                 U EVP_sha256@OPENSSL_3.0.0

There the function EVP_sha256 comes from libcrypto and is linked dynamically on my machine - as expected.

Unfortunately I don't have the tools at hand to test the same for the rhdf5.so created by the Mac builder. I get a "file format not recognised" if I try that.

hpages commented 11 months ago

Am I right in thinking that kjohnson1 had both the static openssl from Simon Urbanek and the dynlib hombrew version installed?

Yep. Also note that the homebrew version provides both, static and dynlib libraries, as mentioned previously.

However yesterday @jwokaty removed the homebrew version (see https://github.com/Bioconductor/BBS/issues/378). Thanks Jen! So we'll see how things go on the next report for kjohnson1 (it should get updated later today).

could the reason that otool -L rhdf5/libs/rhdf5.so does not report libcrypto.dylib be because it actually successfully linked against the static version?

As you can see above, the linking command for rhdf5 contains no path to libcrypto.a, only -lcrypto. But it also contains /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/Rhdf5lib/lib/libhdf5.a so it probably gets all the symbols it needs from there:

kjohnson1:library biocbuild$ nm Rhdf5lib/lib/libhdf5.a | grep EVP_sha256
                 U _EVP_sha256

Bingo!

I get the following if I use nm on the rhdf5.so built and installed on my linux machine:

On kjohnson1:

kjohnson1:library biocbuild$ nm rhdf5/libs/rhdf5.so | grep EVP_sha256
000000000033035c T _EVP_sha256

kjohnson1:library biocbuild$ otool -L rhdf5/libs/rhdf5.so
rhdf5/libs/rhdf5.so:
    rhdf5.so (compatibility version 0.0.0, current version 0.0.0)
    /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)
    /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libR.dylib (compatibility version 4.3.0, current version 4.3.2)
    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1775.118.101)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)

¯\_(ツ)_/¯

There the function EVP_sha256 comes from libcrypto and is linked dynamically on my machine - as expected.

Yes, I get that on Linux too. But since the linking command on Linux also contains libhdf5.a then I'm not sure that -lcrypto is needed there either.

Unfortunately I don't have the tools at hand to test the same for the rhdf5.so created by the Mac builder. I get a "file format not recognised" if I try that.

Linux tools won't recognize binaries created on Mac.

Is there a way you could have access to a Mac? I'm not sure it's really feasible to troubleshoot all this stuff and come up with a clean linking command for Rhdf5lib + rhdf5 that does the right thing on Linux and Mac without actually having access to both platforms. There's just too much going on.

hpages commented 10 months ago

TLDR: -lcrypto in the output of Rhdf5lib::pkgconfig() is not needed because rhdf5.so gets statically linked to libhdf5.a (from Rhdf5lib) which is already statically linked to libcrypto.