Closed warrenmcg closed 5 years ago
Thanks for the report. I assumed R would require zlib, and so it would be available, but my knowledge of bioconda is pretty small. I'll checking with that issue if I need more details.
I've also created a rhdf5 branch (https://github.com/grimbough/rhdf5/tree/H5Z) to try and provide a more helpful error message if it happens in the future. It seems a shame HDF5 doesn't report the problem more specificially itself, but there appears to be functionality for listing and checking filters.
Great, thanks for checking in and for your work in improving rhdf5!
I'm pinging @johanneskoester, just so he's kept in the loop about further rhdf5 development. He may have some insight about why zlib might not be available with bioconda.
I'm not sure if the only users who have reported an error are installing R through bioconda, or if any are doing the native install and installing rhdf5 through bioconductor. I personally have never reproduced the error from my end (using a native-R/bioconductor build).
In any case, I think having a helpful error message will make it easier for sleuth to provide an informative error message for our users.
With bioconda, we ensure that packages are built in an environment that has nothing available. This way, software usually complains when a dependency is missing and we can add it to the package (as a dependency in the metadata). If now, like here, the software does not complain, such things can happen. That's why I appreciate a lot that this behavior is now being changed.
This is e.g. the code of the rhdf5lib recipe:
{% set version = "1.4.2" %}
{% set name = "Rhdf5lib" %}
{% set bioc = "3.8" %}
package:
name: 'bioconductor-{{ name|lower }}'
version: '{{ version }}'
source:
url:
- 'https://bioconductor.org/packages/{{ bioc }}/bioc/src/contrib/{{ name }}_{{ version }}.tar.gz'
- 'https://bioarchive.galaxyproject.org/{{ name }}_{{ version }}.tar.gz'
- 'https://depot.galaxyproject.org/software/bioconductor-{{ name|lower }}/bioconductor-{{ name|lower }}_{{ version }}_src_all.tar.gz'
md5: 05dbaf6e2c8cebfe41e4e5a98f8a75ed
patches:
- configure.patch
build:
number: 3
rpaths:
- lib/R/lib/
- lib/
requirements:
host:
- r-base
- zlib
run:
- r-base
- zlib
build:
- {{ compiler('c') }}
- {{ compiler('fortran') }}
- automake
- make
test:
commands:
- '$R -e "library(''{{ name }}'')"'
about:
home: 'https://bioconductor.org/packages/{{ bioc }}/bioc/html/{{ name }}.html'
license: Artistic-2.0
summary: 'Provides C and C++ hdf5 libraries.'
As you can see, all dependencies are listed explicitly, which allows conda to take care of installing them along with the rhdf5lib package.
Thanks for the extra info. So from a bioconda team perspective, it would be sufficient for the configure script to check for a valid zlib and give a useful error message if one can't be found? I assume that for now the issue won't be encountered because your recipe now includes zlib as requirement?
Yes to both questions :-).
As we are talking about it. For successful compilation, we also had to patch configure like this.
The zlib branch now contains an updated configure script that should check for the presence of the zlib library and fail early if it can't be found. This seems to work on Linux, OSX, and Solaris, but then so did the old version. There's also some modification to the flags for compiling SZIP, which will break your patch, so I'm trying to get a bioconda build working too.
However, I'm struggling to figure out how to test the bioconda implementation. I've been trying to follow the instructions at https://bioconda.github.io/contribute-a-recipe.html and everything seems to work ok, but since we know it will happily build the package without locating zlib or szip this doesn't seem sufficient.
Ideally I'd be able to view the output of R CMD INSTALL --build .
, which I tried in the docker container produced during the recipe test, but things like the gcc toolchain aren't exported and it fails immediately. If you have any pointers on what I should run in the docker container prior to trying to build the R package it would be much appreciated.
I've also updated rhdf5 to try and help diagnose this more easily for other filters. There's now an internal function that will check when reading a dataset and output this if a filter is missing:
> rhdf5:::h5checkFilters(did)
Unable to read dataset. Not all required filters available.
Missing filters: blosc
The recipe for bioconductor-rhdf5 that we have on bioconda-recipes, contains a test case that would fail if something goes wrong. So, you could build both packages, and if the second works as well, you are fine. Also, you can create a pull request, then, all that is done automatically for you in the CI.
This still misses the bioconda installed zlib when I try to install it with R installed via conda. zlib is there, but the configure script just checks /usr/local and bails out.
The bioconda package for rhdf5lib should be correctly linked. But manual building via R on top of a conda installation will likely not see the dependencies (because they are not in the default system wide paths). When building the conda package, we have to explicitly tell the compiler where to look: https://github.com/bioconda/bioconda-recipes/tree/master/recipes/bioconductor-rhdf5lib/build.sh In addition, a patch is needed for the configure script: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/bioconductor-rhdf5lib/configure.patch
@johanneskoester thanks so much, somehow I missed that there was a bioconda package for rhdf5lib.
Rule of thumb: there is a Bioconda package for everything :-).
Hi @grimbough,
I just wanted to bring your attention to an issue thread on sleuth's repository (pachterlab/sleuth#120). The root cause of this seems to originate with an edge case involving Rhdf5lib.
Users were getting a cryptic error message:
Because of some excellent bug-hunting and troubleshooting by @johanneskoester, we found the problem. Here is his explanation:
Sleuth reads kallisto-created HDF5 files, and kallisto uses gz compression to keep the files small, thus zlib compression is required. @johanneskoester did some excellent work creating a bioconda recipe (bioconda/bioconda-recipes#12780) to include a test case and make sure zlib compression is available.
I just wanted to bring this to your attention so that a similar check can be included in the bioconductor release of these packages.