grimbough / rhdf5

Package providing an interface between HDF5 and R
http://bioconductor.org/packages/rhdf5
59 stars 22 forks source link

Implicit libcurl -> openssl dependency causing libcrypto.so.1.1 missing error? #131

Closed dlaehnemann closed 9 months ago

dlaehnemann commented 9 months ago

We are currently seeing the following error message on a user's conda installation of the r-sleuth bioconda recipe, when they try to load sleuth with library(sleuth):

Error: package or namespace load failed for ‘sleuth’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/admin/RNASEQ/.snakemake/conda/2012934a83704804c4910fcb53b0e67d_/lib/R/library/rhdf5/libs/rhdf5.so':
  libcrypto.so.1.1: cannot open shared object file: No such file or directory
Execution halted

As this depends on the bioconductor-rhdf5 bioconda package and the error message seems to come from the rhdf5.so, I started looking around a bit. From what I understand, this kind of issue seems to come up whenever openssl dependencies are not clearly defined. So I am wondering whether openssl might be an undeclared implicit dependency via an undeclared libcurl dependency. Namely, does this commit introducing libcurl4-openssl-dev installation in the GitHub Actions tests imply that libcurl actually needs to be present on the system? This line remains in the testing code base. However, I couldn't find any respective references to either curl, libcurl, openssl or libcrypto in the code base, so wouldn't know where this dependency originates. But when I look at the mentioned rhdf5.so file from the error message (in a working version of that environment on my computer, not in the broken one of the mentioned user), I at least see mentions of curl functions:

❯ nm -D rhdf5.so | grep curl
                 U curl_easy_cleanup
                 U curl_easy_init
                 U curl_easy_perform
                 U curl_easy_setopt
                 U curl_global_cleanup
                 U curl_global_init
                 U curl_slist_append
                 U curl_slist_free_all
00000000002662f0 T curlwritecallback

If this is an actual dependency, I think it should be declared in the bioconda recipe meta.yaml and maybe also mentioned somewhere in the README.md?

grimbough commented 9 months ago

Thanks for the report. First off I'll caveat everything below by saying that I know very little about bioconda or how the package bioconductor tools. My primary platform for distributing this package is the Bioconductor build system, and I know a lot more about how that works and what's available there.


The linking to libcurl is intended to come from Rhdf5lib where libcurl is an optional dependency. The configure script looks for the existance of the curl and openssh libraries, and then chooses whether to enable those features in HDF5 depending on that result. It's only a soft dependency as having the the functionality that provides is nice, but not necessary for the core features of HDF5.

The required link flags can be access via a call to Rhdf5lib::pkgconfig(), which rhdf5 does during compilation to make sure it's linking against the libraries in Rhdf5lib. Below is the output on my laptop, and I presume it's something similar on the bioconda build system.

> Rhdf5lib::pkgconfig()
"/mnt/data/R-lib/4.3-bioc_3.17/Rhdf5lib/lib/libhdf5_cpp.a" "/mnt/data/R-lib/4.3-bioc_3.17/Rhdf5lib/lib/libhdf5.a" -lcrypto -lcurl -lsz -laec -lz -ldl -lm

If the bioconda builders have curl and openssl available to R in a way that the configure script can access, then this is probably why you're seeing these symbols in the rhdf5.so distributed via bioconda. Based on this, I wonder if it would make more sense for the libcurl to actually be placed in the recipie for bioconductor-rhdf5lib?

dlaehnemann commented 9 months ago

Many thanks for the explanation!

This actually makes the underlying problem much clearer to me: I am assuming that the package version used by the mentioned user was built at a time, when the GitHub Actions runners were still using Ubuntu 20.04 for building the recipes, and the Ubuntu 20.04 runner contains openssl1.1 and libcurl, so the package implicitly picked this up and linked against the runner's system library. At the same time, as this is not listed as a dependency, the resulting package doesn't contain the library, so on any newer system with openssl3, this will fail.

And I also agree, that putting the libcurl dependency into the bioconductor-rhdf5lib is the correct solution, then. This enables the useful extra functionality, and hopefully avoids accidentally linking against a system library (although I am not 100% sure, how things like Rhdf5lib::pkgconfig() will figure out the library path and if that plays nicely with how conda build does things -- I'll try to come up with a way to check this...).

grimbough commented 9 months ago

Rhdf5lib tries to determine things like the library path, compilers and flags etc based on the output of calls to R CMD config. This is primarily to ensure that the package libraries are built with the same settings as R itself was. Assuming the output from those calls for the version of R distibuted via conda is consistent with the conda installed libraries (and I'm pretty sure it is) then I don't think it will be an issue to make sure the correct versions of the packages are found.

As a general statement it's common to encounter issues when people are running R installed via conda, but try to compile packages from source rather than using the conda binaries, but IIRC some of those issues were resolved by making sure as many settings as possible are obtained by R CMD config rather then using system defaults.

Feel free to report back here if you're still experiencing problems. It's also definitely true that the various configure script run into issues with edge cases like this and need to be refined.

dlaehnemann commented 9 months ago

OK, that sounds reassuring. Many thanks for the swift help and explanations!

dlaehnemann commented 9 months ago

One more thing while I have your attention: :smile:

Do you happen to follow semantic versioning with the rhdf5 family of bioconductor packages that you maintain? Because we are thinking about how strictly to pin version numbers for bioconductor packages used by other packages and with the heterogeneity of version numbering on bioconductor in general, the default would be to pin both major and minor version. But if you can confirm that you (try to) make sure that breaking changes only occur when the major version is bumped, we could also just pin the major version.

grimbough commented 9 months ago

No, I don't follow semantic versioning and it may be that a breaking change is introduced between minor version increases. I see from the other issue that you've already read the Bioconductor versioning blurb, and realised that there are some idiosyncrasies and vagaries for developers there.

Since Bioconductor has a two active branch model (RELEASE_X_Y and devel) it's actually pretty common for breaking changes to be introduced in the devel branch. It's not expected that packages in devel should work all the time, and it's acceptable for them to change API etc. Those two packages will have version number X.Y.0 and X.Y+1.0 at the start of each six-monthly release cycle, and the Y component is incremented each release regardless of whether the package has changed at all.

Recommended practice is for any breaking change to be deprecated i.e. the old code continues working but with a warning to the end user, and then made defunct over two release cycles. So in practice any breaking change should have somewhere between 6 and 12 months of deprecation exposure before it is completed.

That said, no breaking changes should be introduced into the RELEASE_X_Y branch during a release cycle. Only bug fixes, each of which should be accompanied by a patch version bump. If you don't make the patch bump the changes don't actually propagate through Bioconductors build system and reach end users, so that paradigm is pretty well adhered to. The Bioconductor git repository also has hooks that actively prevent you from bumping the minor version number yourself, so it's hard (almost impossible) to do that outside of Bioconductors' release schedule.

If I was trying to pin version numbers I think I would aim to fix on the minor number for all packages across a specific Bioconductor release. It is expected that they will all be compatible with each other. I don't know how feasible that is in your infrastructure, but from those recipe files it looks like bioconda already has some concept of the Bioconductor release numbering etc. so maybe it's actually fairly simple. Things might get messier with CRAN dependencies added to the mix, but you might have to rely on package authors putting that in the DESCRIPTION files since it probably fundamentally affects the package.

dlaehnemann commented 9 months ago

Thanks again for even more useful insights. I cross-posted your thoughts in the respective bioconda tracking issue.

dlaehnemann commented 9 months ago

I'll close this, as I think we found the culprit for the bioconda recipe setup. Again, many thanks!