Open pimentel opened 7 years ago
@warrenmcg here is the pull request on bioconda. It now includes a proper test case for this issue. Unfortunately, it does not yet work: bioconda/bioconda-recipes#12780.
HI guys. I am happy to tell you that the issue is now fixed when using the latest bioconda packages of rhdf5 and rhdf5lib. It was indeed a combination of missing zlib support and problems when making the included szip library portable. For the future, we have protected ourselves against such problems by adding a test to the bioconductor-rhdf5 package that ensures kallisto compatibility.
Great work @johanneskoester! I wonder what this means for the rhdf5 and rhdf5lib packages when downloading them directly from bioconductor? Do they have this issue? Was this only an issue if kallisto/rhdf5/rhdf5lib were all built using bioconda?
So, this issue was more likely to appear when packaging it in a portable way. However, one issue that certainly occurs also when installing directly is that, if zlib headers are not found, rhdf5lib will silently compile without zlib compression support. Then, upon using it, you get these not very descriptive error messages posted here whenever reading a dataset with zlib compressed tables.
I am having the same problem with rhdf5 and rhdf5lib packages when downloading them directly from bioconductor!
Hi, I am working with previous runs of kallisto (that worked) in other machines. Now I get the sleuth error when trying to load them. My session in R 3.4.3 is afterwards. I have tried a new installation with R 3.5 but can't install rhdf5lib for some reason.
lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic
sessionInfo(
- ) R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.2 LTS
Matrix products: default BLAS: /home/jl/anaconda2/lib/R/lib/libRblas.so LAPACK: /home/jl/anaconda2/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rmarkdown_1.11 knitr_1.21 rhdf5_2.20.0 tximportData_1.6.0 tidyr_0.8.2
[6] dplyr_0.8.0.1 dbplyr_1.3.0 RSQLite_2.1.1 cowplot_0.9.4 ggplot2_3.1.0
[11] sleuth_0.30.0 RevoUtils_10.0.8 RevoUtilsMath_10.0.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 pillar_1.3.1 compiler_3.4.3 plyr_1.8.4 tools_3.4.3
[6] zlibbioc_1.24.0 digest_0.6.18 bit_1.1-14 evaluate_0.13 memoise_1.1.0
[11] tibble_2.0.1 gtable_0.2.0 pkgconfig_2.0.2 rlang_0.3.1 DBI_1.0.0
[16] rstudioapi_0.9.0 parallel_3.4.3 yaml_2.2.0 xfun_0.4 withr_2.1.2
[21] bit64_0.9-7 grid_3.4.3 tidyselect_0.2.5 glue_1.3.0 data.table_1.12.0
[26] R6_2.4.0 purrr_0.3.0 blob_1.1.1 magrittr_1.5 htmltools_0.3.6
[31] scales_1.0.0 assertthat_0.2.0 colorspace_1.4-0 stringi_1.3.1 lazyeval_0.2.1
[36] munsell_0.5.0 crayon_1.3.4
gcc --version gcc (crosstool-NG fa8859cb) 7.2.0 Copyright (C) 2017 Free Software Foundation, Inc.
@egenomics @marcora consider using the bioconda packages. Besides more reproducible analyses and easier management, they do not suffer from this problem anymore.
They do actually!
@marcora: are we talking bioconductor or bioconda? You mentioned installing from bioconductor. However, Johannes seems to have fixed this issue with the bioconda recipe for installing rhdf5, which is different than the standard way of installing rhdf5 through bioconductor. If you are still having an issue after installing the bioconda recipe, can you confirm?
Within R, I use BiocManager::install() to install R packages... not conda. Is there a way to fix this issue when installing/compiling packages from within R directly? If not, I will try to manage my R environment via conda, but it is not optimal when some packages or package versions are not available in conda repos.
Installing from bioconda failed as well. In the end I managed to reinstall everything in a 3.5 R and is working now...
Only the very latest versions of the packages in bioconda are fixed. In particular for the latest R. So if you use e.g. an older R, you will get the bug again because conda fetches older versions of the packages. You need bioconductor-rhdf5 >=2.26.2 and bioconductor-rhdf5lib >= 1.4.2, together with r-base >=3.5.1. In that combination it should work. If not, please post the output of conda list
of the respective environment. Also note that the required channel order defined at https://bioconda.github.io has to be used. Otherwise, conda will e.g. pick stuff from the commercial R or default channels, which might not yet contain our fixes.
Hi @johanneskoester, thank you again for all of your hard work to troubleshoot this issue!
Do you have an update on whether the bioconductor installations of rhdf5 and rhdf5lib are being fixed to address this issue? Has the package developer been informed of this issue? The typical R user will not be working with bioconda, and it would be great to have this working on all fronts.
edit: I went ahead and opened an issue on the Rhdf5lib repo.
I've thought about this some more, and my current suspicion is that this is a file locking issue and some other process is preventing rhdf5 from opening the .h5 file. If you're seeing this error please try running the following in R and then re-run the command that threw the error:
Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE")
If this works please report back.
As for why I'm not convinced this is a missing ZLIB issue, it's mostly due to the fact the error reported here occurs when opening the HDF5 file e.g.
Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.
However, a failure where ZLIB was required but not found would only occur when trying to read a dataset, opening the file should be fine regardless of filter availability, and the error would look something like:
Error in H5Dread(... :
HDF5. Dataset. Read failed.
Here's a little example demonstrating that you get this error if a second process tries to open an HDF5 file that is already open:
## download the example abundance file
h5_file <- tempfile(pattern = "abundance", fileext = ".h5")
download.file('https://raw.githubusercontent.com/pachterlab/sleuth/master/tests/testthat/small_test_data/kallisto/abundance.h5',
destfile = h5_file, mode = "wb")
## open a file handle and view
fid1 <- H5Fopen( h5_file )
fid1
#HDF5 FILE
# name /
# filename
#
# name otype dclass dim
#0 aux H5I_GROUP
#1 bootstrap H5I_GROUP
#2 est_counts H5I_DATASET FLOAT 15
## launch Rscript to run a new process accessing the same file
## this will fail
system2("Rscript",
args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))
#Error in rhdf5::H5Fopen("/tmp/RtmpYJppmS/abundance403f6b7e2c5f.h5") :
# HDF5. File accessibilty. Unable to open file.
#Execution halted
## close the file handle in this process and try again
H5Fclose(fid1)
system2("Rscript",
args = paste0("-e 'fid2 <- rhdf5::H5Fopen(\"", h5_file, "\"); fid2'"))
#HDF5 FILE
# name /
# filename
#
# name otype dclass dim
#0 aux H5I_GROUP
#1 bootstrap H5I_GROUP
#2 est_counts H5I_DATASET FLOAT 15
Hi,
I am experiencing the same problem. I tried the Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE"). But does not work.
sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.2
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.32.1 rhdf5_2.26.2 raster_2.8-19 gdalUtils_2.0.1.14
[5] rgdal_1.4-3 sp_1.3-1 ncdf4_1.16.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 lattice_0.20-38 codetools_0.2-16 foreach_1.4.4 R.methodsS3_1.7.1
[6] grid_3.5.3 R.oo_1.22.0 R.utils_2.8.0 Rhdf5lib_1.4.3 iterators_1.0.10
[11] tools_3.5.3 xfun_0.6 yaml_2.2.0 compiler_3.5.3 BiocManager_1.30.4
[16] knitr_1.22
operation system: macOS Mojave
gcc version: Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1 Apple LLVM version 10.0.1 (clang-1001.0.46.4) Target: x86_64-apple-darwin18.2.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Hi,
I am experiencing the same problem. I tried the Sys.setenv(HDF5_USE_FILE_LOCKING = "FALSE"). But does not work.
sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.2
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] BiocInstaller_1.32.1 rhdf5_2.26.2 raster_2.8-19 gdalUtils_2.0.1.14 [5] rgdal_1.4-3 sp_1.3-1 ncdf4_1.16.1
loaded via a namespace (and not attached): [1] Rcpp_1.0.1 lattice_0.20-38 codetools_0.2-16 foreach_1.4.4 R.methodsS3_1.7.1 [6] grid_3.5.3 R.oo_1.22.0 R.utils_2.8.0 Rhdf5lib_1.4.3 iterators_1.0.10 [11] tools_3.5.3 xfun_0.6 yaml_2.2.0 compiler_3.5.3 BiocManager_1.30.4 [16] knitr_1.22
operation system: macOS Mojave
gcc version: Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1 Apple LLVM version 10.0.1 (clang-1001.0.46.4) Target: x86_64-apple-darwin18.2.0 Thread model: posix InstalledDir: /Library/Developer/CommandLineTools/usr/bin
By the way, ‘kallisto’ is not available (for R version 3.5.3) and ‘sleuth’ is not available (for R version 3.5.3)
I encounter this error regularly when running kallisto/sleuth on a large number of samples. This is persistent across Linux distros and Windows (WSL), containerized and non-containerized packages. In my experience, 100% of unreadable HDF5 messages have been attributed to silent h5 corruption upon generation by kallisto.
Error rate of local/interactive run <<< linux server jobs. Time to pseudoalign on linux server jobs (despite more cores/RAM assigned) is wildly increased as well. I can only speculate that there is some minuscule basal rate of silent h5 corruption by kallisto that is exaggerated in some scenarios (server jobs e.g.). Pseudoaligning many samples may just reach the expected value for at least one corrupt h5.
In agreement with this thread, simply checking the h5 files and regenerating those which have been corrupted works every time. Thanks @brucemoran for the means to conduct the file check.
Dear all, i am using a specific pipeline which removes all .h5 files. I have only the abundance.tsv. This means i cannot run sleuth? Thanks in advance for any help
@oggismetto:
Dear all, i am using a specific pipeline which removes all .h5 files. I have only the abundance.tsv. This means i cannot run sleuth?
AFAICT, read_kallisto
falls back to reading the TSV if no HDF5 was found. So you should be able to run sleuth
, but with the limitation that kallisto
's bootstraps (via -b
flag) are only stored in the HDF5 and you'd loose that extra information on the the estimated technical noise in you differential expression analyses.
Hi, I am having issue with salmon files converted by wasabi in sleuth. Error : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b"
Hi, I am having issue with salmon files converted by wasabi in sleuth. Error : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b" @warrenmcg : Please help me with this issue
@pragathisneha: What operating system are you on? The wasabi README mentiones a Windows issue with bootstrap information from salmon. Regadless, AFAICT, sleuth has done all it (or its developers) can do for you: It tells you what information is missing in your input data. Since you get these from a different tool, you are probably better off asking for help at the wasabi and/or salmon support channels.
Hello,
I am having the same problem:
so <- sleuth_prep(s2c, ~ condition) reading in kallisto results ..Error in H5Fopen(file, "H5F_ACC_RDONLY") : HDF5. File accessability. Unable to open file.
SessionInfo()
R version 4.2.1 (2022-06-23) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.0.1
Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale: [1] pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/pt_PT.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocManager_1.30.18 rhdf5filters_1.9.0 Rhdf5lib_1.18.2 httr_1.4.4 rhdf5_2.40.0
[6] sleuth_0.30.0 Matrix_1.5-0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1 ps_1.7.1 rprojroot_2.0.3
[6] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 mime_0.12 R6_2.5.1
[11] ggplot2_3.3.6 pillar_1.8.1 rlang_1.0.5 curl_4.3.2 lazyeval_0.2.2
[16] rstudioapi_0.14 data.table_1.14.2 miniUI_0.1.1.1 callr_3.7.2 urlchecker_1.0.1
[21] devtools_2.4.4 stringr_1.4.1 htmlwidgets_1.5.4 munsell_0.5.0 shiny_1.7.2
[26] compiler_4.2.1 httpuv_1.6.6 pkgconfig_2.0.3 pkgbuild_1.3.1 htmltools_0.5.3
[31] tidyselect_1.1.2 tibble_3.1.8 fansi_1.0.3 withr_2.5.0 crayon_1.5.1
[36] dplyr_1.0.10 later_1.3.0 grid_4.2.1 xtable_1.8-4 gtable_0.3.1
[41] lifecycle_1.0.2 DBI_1.1.3 magrittr_2.0.3 scales_1.2.1 cli_3.4.0
[46] stringi_1.7.8 cachem_1.0.6 fs_1.5.2 promises_1.2.0.1 remotes_2.4.2
[51] ellipsis_0.3.2 generics_0.1.3 vctrs_0.4.1 tools_4.2.1 glue_1.6.2
[56] purrr_0.3.4 processx_3.7.0 pkgload_1.3.0 parallel_4.2.1 fastmap_1.1.0
[61] colorspace_2.0-3 sessioninfo_1.2.2 memoise_2.0.1 profvis_0.3.7 usethis_2.1.6
macOS Monterey Versão 12.0.1
Kallisto version HDF5 FILES 1.12.2
rhdf5::h5version() This is Bioconductor rhdf5 2.40.0 linking to C-library HDF5 1.10.7 and rhdf5filters 1.9.0
I already tried to look for a more recent rhdf5 package that supports HDF5 FILES 1.12.2 with no success.
How did you solve this issue?
Hi,
I am having the same issue to this day - I was wondering if anyone reached a concensus on the best way to tackle this issue?
I am getting the following message:
so <- sleuth_prep(kal_dirs_fixed, extra_bootstrap_summary = TRUE)
reading in kallisto results dropping unused factor levels .......................................... normalizing est_counts 58048 targets passed the filter normalizing tpm merging in metadata Error in H5Fopen(file, flags = flags, fapl = fapl, native = native) : HDF5. File accessibility. Unable to open file. In addition: Warning message: In check_num_cores(num_cores) : It appears that you are running Sleuth from within Rstudio. Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1. If you wish to take advantage of multiple cores, please consider running sleuth from the command line.
sessionInfo() R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8
time zone: Europe/London tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] BiocManager_1.30.21 Rhdf5lib_1.22.0 cowplot_1.1.1 sleuth_0.30.1 rhdf5_2.44.0
loaded via a namespace (and not attached):
[1] tximport_1.28.0 KEGGREST_1.40.0 gtable_0.3.3 ggplot2_3.4.2 Biobase_2.60.0 rhdf5filters_1.12.1
[7] vctrs_0.6.3 tools_4.3.0 bitops_1.0-7 generics_0.1.3 parallel_4.3.0 stats4_4.3.0
[13] curl_5.0.1 tibble_3.2.1 fansi_1.0.4 AnnotationDbi_1.62.1 RSQLite_2.3.1 blob_1.2.4
[19] pkgconfig_2.0.3 data.table_1.14.8 dbplyr_2.3.2 S4Vectors_0.38.1 lifecycle_1.0.3 GenomeInfoDbData_1.2.10
[25] compiler_4.3.0 stringr_1.5.0 Biostrings_2.68.1 progress_1.2.2 munsell_0.5.0 GenomeInfoDb_1.36.0
[31] RCurl_1.98-1.12 lazyeval_0.2.2 tidyr_1.3.0 pillar_1.9.0 crayon_1.5.2 cachem_1.0.8
[37] tidyselect_1.2.0 digest_0.6.31 stringi_1.7.12 purrr_1.0.1 dplyr_1.1.2 biomaRt_2.56.1
[43] fastmap_1.1.1 grid_4.3.0 colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 XML_3.99-0.14
[49] utf8_1.2.3 withr_2.5.0 prettyunits_1.1.1 filelock_1.0.2 scales_1.2.1 rappdirs_0.3.3
[55] bit64_4.0.5 XVector_0.40.0 httr_1.4.6 bit_4.0.5 png_0.1-8 hms_1.1.3
[61] memoise_2.0.1 IRanges_2.34.0 BiocFileCache_2.8.0 rlang_1.1.1 glue_1.6.2 DBI_1.1.3
[67] xml2_1.3.4 BiocGenerics_0.46.0 rstudioapi_0.14 R6_2.5.1 zlibbioc_1.46.0
Please let me know if you are able to help solve this
Best, Mariana
Some users have reported having issues reading the H5 files.
Here is the error:
I would like to track this down so if you are having this issue please respond with the following:
gcc --version
And any other information you think might be informative.
Thanks,
Harold