Open pimentel opened 7 years ago
Hi Harold,
I am getting this error as well.
sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.12.5 (unknown)
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] bindrcpp_0.1 sleuth_0.29.0 dplyr_0.7.0 ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 tidyr_0.6.3 assertthat_0.2.0 grid_3.3.0 plyr_1.8.4 R6_2.2.1 gtable_0.2.0 magrittr_1.5 scales_0.4.1 zlibbioc_1.16.0
[11] rlang_0.1.1 lazyeval_0.2.0 data.table_1.10.4 tools_3.3.0 glue_1.0.0 munsell_0.4.3 parallel_3.3.0 rhdf5_2.14.0 pkgconfig_2.0.1 colorspace_1.3-2
[21] bindr_0.1 tibble_1.3.3
Operating system: macOS Sierra version 10.12.5 RStudio Version 1.0.136 gcc version Apple LLVM version 8.1.0 (clang-802.0.42)
Hope this helps!
Rachel
Hi Harold and Rachel,
Same problem here.
so <- sleuth_prep(s2c, full_model = full_design) reading in kallisto results dropping unused factor levels ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata Error in H5Fopen(file, "H5F_ACC_RDONLY") : HDF5. File accessability. Unable to open file.
#########################################################
R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] splines stats graphics grDevices utils datasets methods base
other attached packages: [1] sleuth_0.29.0 dplyr_0.5.0 ggplot2_2.2.1 BiocInstaller_1.24.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.11 magrittr_1.5 zlibbioc_1.20.0 devtools_1.13.2
[5] munsell_0.4.3 colorspace_1.3-2 R6_2.2.1 rlang_0.1.1
[9] httr_1.2.1 plyr_1.8.4 tools_3.3.2 parallel_3.3.2
[13] grid_3.3.2 rhdf5_2.18.0 data.table_1.10.4 gtable_0.2.0
[17] DBI_0.6-1 git2r_0.18.0 withr_1.0.2 lazyeval_0.2.0
[21] digest_0.6.12 assertthat_0.2.0 tibble_1.3.3 tidyr_0.6.3
[25] curl_2.6 memoise_1.1.0 scales_0.4.1
############################################################# I tried another drive using setwd() withouth success I also tried
...continuing...
options(max.print=10000000)
also without success.
Regards,
Jose
Hi,
I run the same script on a linux machine. This time I got errors/warnings, including:
1: In read_kallisto(path, read_bootstrap = TRUE, max_bootstrap = max_bootstrap) : You specified to read bootstraps, but we won't do so for plaintext
Indeed I have run kallisto with the --plain-text option
Now I am re-running kallisto withouth the option, and we will see what happens.
Perhaps the R versions of sleuth on Mac and Windows are not reporting the errors/warnings above.
Regards,
Jose
Hi,
I rerun kallisto without the --plain-text option.
Now the .h5 files were created in the expected subdirectories, it was not there before.
when running the command
so <- sleuth_prep(s2c, full_model = full_design)
on a Windows machine I now get
reading in kallisto results dropping unused factor levels ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata summarizing bootstraps Error in parallel::mclapply(x, y, mc.cores = num_cores) : 'mc.cores' > 1 is not supported on Windows ###################################################
On looking the documentation on sleuth_prep at
https://pachterlab.github.io/sleuth/docs/sleuth_prep.html
SUGGESTION 1: I cannot find an option to limit the script to a single core, so I suggest that that is either included as a switch on sleuth_prep or that a new version of the function takes into account the environment and indicates how many cpu's to use.
SUGGESTION 2:
On
https://pachterlab.github.io/kallisto/manual
The text
Optional arguments: --bias Perform sequence based bias correction -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 --fusion Search for fusions for Pizzly
could be changed to (ADDED TEXT IN BOLD).
Optional arguments: --bias Perform sequence based bias correction -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 (NOT COMPATIBLE WITH SLEUTH) --fusion Search for fusions for Pizzly
On the other hand, running the same script on a Linux machine after rerunning kallisto I got no error messages! Bingo!
so <- sleuth_prep(s2c, full_model = full_design) reading in kallisto results ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata normalizing bootstrap samples summarizing bootstraps
Regards,
Jose
@jmcribeiro, it appears the documentation on the website is not up-to-date with the current version, as it is run separately, so it doesn't have the new options. If you go into R and do ?sleuth_prep
you'll see the most up-to-date documentation.
The option you want for sleuth_prep
is num_cores
. So so <- sleuth_prep(s2c, full_model = full_design, num_cores = 1)
.
Hi Warren,
Thanks for your comment. your recommendation worked! Thanks.
Please see my recommendation to make sure R in windows flags the plain-text flag as well to avoid other users getting lost.
Regards,
Jose
Hello!
Those are two great suggestions.
For the Windows issue, we can set a quick patch to warn users that Windows does not support mclapply and switch num_cores to 1. Moving forward, we can explore switching to using the future
package, which would allow Windows users to operate multiple cores too.
For the text files issue, I wonder if this is the reason most people are having issues? I think it would make sense for sleuth_prep
to check for abundance.tsv files if the abundance.h5 is absent, and use the appropriate read method.
What do you think @pimentel of these two options?
I have rerun kallisto and removed the --plain-text flag which removed the h5 error. However, now I get this error:
.Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../analysis/data/kallisto/Deer_R1_S22/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".
Any help is greatly appreciated,
Rachel
Hi @rachelzoeb,
What was your full kallisto
command, and what version of kallisto
did you use?
It seems that you did not use the -b
option when running kallisto
, which is a requirement to take full advantage of sleuth
.
If you did use the -b
option with this error, maybe there is a something wrong with how sleuth is interacting with your particular version of kallisto
.
If you are using the latest version of kallisto
, then it would be helpful if you gave your OS and version of gcc (use gcc --version
) as Harold suggested above, and emailed your abundance.h5
file to him or posted it here for me and other users to look at to help you out.
@warrenmcg thanks so much for fielding these questions.
regarding the windows patch: that sounds like a great idea
Unfortunately the bootstraps are not available via plaintext at all. This is because H5 provides nice compression that is a bit of a pain to get otherwise. Initially, plaintext abundance.tsv
was only intended for quick sanity checks. However, we have been discussing changing the format to remove the dependency on H5 which has proven to be an issue for some time now...
Hi,
I ran kallisto with quant --bootstrap-samples=100 --threads=16 and 4 out of 8 of my h5 files had the can-not-open error. My kallisto ran on a linux server and then I downloaded the h5 files to my local machine (mac OS) to run sleuth in R. Do you think there might be an error during the file transfer? Also, I have checked the file size of the error h5 and for 3 out of 4 files have the error, the h5 file size is smaller than the tsv file. Not sure it is related. Thanks in advance for any help!
sessionInfo() in R -
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.20.0 BiocInstaller_1.26.0 bindrcpp_0.2 synapseClient_1.15-0 sleuth_0.29.0
[6] dplyr_0.7.2 ggplot2_2.2.1 biomaRt_2.32.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 compiler_3.4.1 plyr_1.8.4 bindr_0.1 zlibbioc_1.22.0
[6] bitops_1.0-6 tools_3.4.1 digest_0.6.12 bit_1.1-12 RSQLite_2.0
[11] memoise_1.1.0 tibble_1.3.4 gtable_0.2.0 pkgconfig_2.0.1 rlang_0.1.2
[16] DBI_0.7 parallel_3.4.1 stringr_1.2.0 S4Vectors_0.14.3 IRanges_2.10.2
[21] stats4_3.4.1 bit64_0.9-7 grid_3.4.1 glue_1.1.1 Biobase_2.36.2
[26] data.table_1.10.4 R6_2.2.2 AnnotationDbi_1.38.2 XML_3.98-1.9 tidyr_0.7.0
[31] reshape2_1.4.2 blob_1.1.0 magrittr_1.5 matrixStats_0.52.2 scales_0.5.0
[36] BiocGenerics_0.22.0 assertthat_0.2.0 colorspace_1.3-2 stringi_1.1.5 RCurl_1.95-4.8
[41] lazyeval_0.2.0 munsell_0.4.3 rjson_0.2.15
>
OS - kallisto was ran on 3.2.0-29-generic GNU/Linux sleuth was ran in R on macOS Sierra version 10.12.5
gcc -
$ gcc --version
Apple LLVM version 8.1.0 (clang-802.0.42)
I am currently having this issue.
I have a data frame built as it is in the walkthrough, and it looks like this:
sample condition path
1: P1 ns expression/P1
2: P2 ns expression/P2
3: P3 s expression/P3
4: P4 s expression/P4
5: P5 ns expression/P5
6: P6 ns expression/P6
7: P7 s expression/P7
8: P8 s expression/P8
9: P9 ns expression/P9
10: P10 ns expression/P10
11: P11 s expression/P11
12: P12 s expression/P12
>sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final) locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 sleuth_0.29.0 dplyr_0.7.4 ggplot2_2.2.1 edgeR_3.16.5
[6] biomaRt_2.30.0 limma_3.30.13 data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] locfit_1.5-9.1 tidyselect_0.2.2 purrr_0.2.4 lattice_0.20-34
[5] rhdf5_2.18.0 colorspace_1.3-2 htmltools_0.3.6 stats4_3.3.1
[9] viridisLite_0.2.0 yaml_2.1.14 base64enc_0.1-3 XML_3.98-1.7 [13] plotly_4.7.1 rlang_0.1.2 glue_1.1.1 DBI_0.6-1
[17] BiocGenerics_0.20.0 bindr_0.1 plyr_1.8.4 stringr_1.2.0 [21] zlibbioc_1.20.0 munsell_0.4.3 gtable_0.2.0 htmlwidgets_0.9
[25] memoise_1.1.0 evaluate_0.10 Biobase_2.34.0 knitr_1.15.1
[29] IRanges_2.8.2 parallel_3.3.1 AnnotationDbi_1.36.2 Rcpp_0.12.13
[33] scales_0.5.0 backports_1.1.0 S4Vectors_0.12.2 jsonlite_1.5
[37] digest_0.6.12 stringi_1.1.5 grid_3.3.1 rprojroot_1.2 [41] tools_3.3.1 bitops_1.0-6 magrittr_1.5 lazyeval_0.2.0
[45] RCurl_1.95-4.8 tibble_1.3.4 RSQLite_1.1-2 tidyr_0.7.2 [49] pkgconfig_2.0.1 assertthat_0.1 rmarkdown_1.6 httr_1.2.1
[53] R6_2.2.2
OS is CENTOS, 2.6.32-696.10.2.el6.x86_64
bash-4.1$ gcc --version
gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
However, I don't know why it is trying to read H5 files.
In the expression directories I only have tsv
files (ran kallisto
with --plain-text
output)
Lastly, the error is:
It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.reading in kallisto results
dropping unused factor levels
............
normalizing est_counts
59202 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.
Hi had the same problem and solved it.
The issue was due to the file structure I was using. Clearly this may not be the issue for everybody.
When setting up the kr_dirs data frame as per the instructions (https://pachterlab.github.io/sleuth_walkthroughs/trapnell/analysis.html)
the program assumes that each sample is found it own directory which has both the
abundance.tsv and abundance.h5
with the file name unedited. When I arranged the files like this the error was not tripped.
hope that helps
Hello
I am also having the same error message with one of my files (I have 46 and it only seems to be kicking up this one,which I have re-generated by re-running Kallisto)
Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../quant/WTCHG_412393_006/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".
However, I used 100 bootstraps when I ran Kallisto and and when I look at the run info file also produced by kallisto it confirms this for this sample.
{ "n_targets": 60054, "n_bootstraps": 100, "n_processed": 15681904, "kallisto_version": "0.43.1", "index_version": 10, "start_time": "Wed Nov 22 12:14:52 2017", "call": "kallisto quant -i transcripts.idx -o quant/WTCHG_403319_006 -b 100 ../../data/reads/WTCHG_403319_006_1.fastq.gz ../../data/reads/WTCHG_403319_006_2.fastq.gz" }
My sleuth prep command is this: so <- sleuth_prep(sample_to_condition, target_mapping = ttg, aggregation_column = 'gene_id', extra_bootstrap_summary = TRUE, num_cores=1)
Any help appreciated! I used Kallisto v0.43.1 on our uni Linux server then am running Sleuth (latest version) on my macbook.
Sarah
@sarahharvey88, that is odd. Could you send the problematic h5 file so I can reproduce the error on my side? Email me at:
warren-mcgee at fsm.northwestern.edu (replace at with @ and remove spaces)
@miguelroboso, as has been mentioned previously, the plain text files do not have the bootstraps included. You should rerun kallisto
without the --plaintext
option included. The error you are seeing is because there is a line within sleuth_prep
that expects an h5 file to be present.
pinging @pimentel: the offending line causing Miguel's user-unfriendly error is this one. The current version expects an H5 file to be present, so should we be more explicit about that requirement in sleuth_prep
?
Also get this error. NB samples were run using Nextflow and executed by PBS/Torque. When I rerun the offending samples 'interactively' they all work. Not ideal though...
Kallisto command:
kallisto quant \
-l ${params.fragment_len} \
-s ${params.fragment_sd} \
-b ${params.bootstrap} \
-i ${index} \
-t ${task.cpus} \
-o ./ \
${reads1} ${reads2}
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /apps/software/R/3.4.0/lib64/R/lib/libRblas.so
LAPACK: /apps/software/R/3.4.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_IE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_IE.UTF-8 LC_COLLATE=en_IE.UTF-8
[5] LC_MONETARY=en_IE.UTF-8 LC_MESSAGES=en_IE.UTF-8
[7] LC_PAPER=en_IE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] bindrcpp_0.2 rhdf5_2.20.0 biomaRt_2.32.1 sleuth_0.29.0 dplyr_0.7.4
[6] ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.13 compiler_3.4.0 plyr_1.8.4
[4] bindr_0.1 zlibbioc_1.22.0 bitops_1.0-6
[7] digest_0.6.12 bit_1.1-12 RSQLite_2.0
[10] memoise_1.1.0 tibble_1.3.4 gtable_0.2.0
[13] pkgconfig_2.0.1 rlang_0.1.2 DBI_0.7
[16] parallel_3.4.0 IRanges_2.10.5 S4Vectors_0.14.7
[19] stats4_3.4.0 bit64_0.9-7 grid_3.4.0
[22] glue_1.1.1 data.table_1.10.4-2 Biobase_2.36.2
[25] R6_2.2.2 AnnotationDbi_1.38.2 XML_3.98-1.9
[28] blob_1.1.0 magrittr_1.5 scales_0.5.0
[31] BiocGenerics_0.22.1 assertthat_0.2.0 colorspace_1.3-2
[34] RCurl_1.95-4.8 lazyeval_0.2.0 munsell_0.4.3
cat /etc/*-release | head -n1
CentOS Linux release 7.3.1611 (Core)
gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
NB to find offending h5 files, you can use h5ls(<path/to/abundance.h5>)
. From this it seems that dim(h5ls(<path/to/abundance.h5>))
should be 115. So using something like below will show those samples that fail.
apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })
Hello,
I am experiencing a similar problem.
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 sleuth_0.29.0 dplyr_0.7.4 ggplot2_2.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 rstudioapi_0.7 bindr_0.1 magrittr_1.5
[5] zlibbioc_1.24.0 devtools_1.13.4 munsell_0.4.3 colorspace_1.3-2
[9] R6_2.2.2 rlang_0.1.6 plyr_1.8.4 tools_3.4.3
[13] parallel_3.4.3 grid_3.4.3 rhdf5_2.22.0 data.table_1.10.4-3
[17] gtable_0.2.0 utf8_1.1.3 cli_1.0.0 withr_2.1.1
[21] lazyeval_0.2.1 assertthat_0.2.0 digest_0.6.14 tibble_1.4.2
[25] crayon_1.3.4 memoise_1.1.0 glue_1.2.0 compiler_3.4.3
[29] pillar_1.1.0 scales_0.5.0 pkgconfig_2.0.1
Operating System:
Linux ubuntu 4.13.0-32-generic x86_64 GNU/Linux
GCC version:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
The R instance is being run within a virtual machine hosted by a Windows OS, but I am not sure if that tells you anything or not.
I get a slightly different H5-related error message:
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : HDF5. Dataset. Read failed.
Like Bruce's experience above, it only happens for some of my files, and if I re-run kallisto interactively for these files (instead of from a shell script), the resulting files can be read using sleuth with no issues.
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)
Matrix products: default
BLAS: /usr/analysis/src/R/R-3.4.3/lib/libRblas.so
LAPACK: /usr/analysis/src/R/R-3.4.3/lib/libRlapack.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 sleuth_0.29.0 dplyr_0.7.4
[4] ggplot2_2.2.1 BiocInstaller_1.28.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 bindr_0.1 magrittr_1.5
[4] zlibbioc_1.24.0 tidyselect_0.2.3 munsell_0.4.3
[7] colorspace_1.3-2 R6_2.2.2 rlang_0.1.6
[10] stringr_1.2.0 plyr_1.8.4 tools_3.4.3
[13] parallel_3.4.3 grid_3.4.3 rhdf5_2.22.0
[16] data.table_1.10.4-3 gtable_0.2.0 lazyeval_0.2.1
[19] assertthat_0.2.0 tibble_1.4.2 reshape2_1.4.3
[22] purrr_0.2.4 tidyr_0.8.0 glue_1.2.0
[25] stringi_1.1.6 compiler_3.4.3 pillar_1.1.0
[28] scales_0.5.0 pkgconfig_2.0.1
Red Hat Enterprise Linux Server release 6.9 (Santiago)
gcc --version
gcc (GCC) 4.7.4
Hello!
I am still receiving this error message:
reading in kallisto results
dropping unused factor levels
....................................................................................................................................................Error in H5Fopen(file, "H5F_ACC_RDONLY") :
HDF5. File accessability. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.
I am using kallisto 0.44.0. I ran the initial kallisto script using this command:
kallisto quant -i transcripts.idx -o output -b 100 READ1.fastq READ2.fastq
I then tried to run the sleuth_prep command in a couple of ways and got the same error both times.
so <- sleuth_prep(s2c, extra_bootstrap_summary = TRUE)
and
> mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
+ dataset = "hsapiens_gene_ensembl",
+ host = 'ensembl.org')
> t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",
+ "external_gene_name"), mart = mart)
> t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,
+ ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
> so <- sleuth_prep(s2c, target_mapping = t2g)
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.30.0 hexbin_1.27.1 sleuth_0.29.0 ggplot2_2.2.1 data.table_1.11.2 BiocInstaller_1.24.0
[7] bindrcpp_0.2.2 dplyr_0.7.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 git2r_0.21.0 plyr_1.8.4 bindr_0.1.1 bitops_1.0-6 tools_3.3.0
[7] zlibbioc_1.20.0 bit_1.1-13 digest_0.6.15 lattice_0.20-35 RSQLite_2.1.1 memoise_1.1.0
[13] tibble_1.4.2 gtable_0.2.0 rhdf5_2.18.0 pkgconfig_2.0.1 rlang_0.2.0 DBI_1.0.0
[19] curl_3.2 yaml_2.1.19 parallel_3.3.0 withr_2.1.2 httr_1.3.1 knitr_1.20
[25] IRanges_2.8.2 S4Vectors_0.12.2 devtools_1.13.5 bit64_0.9-7 stats4_3.3.0 grid_3.3.0
[31] tidyselect_0.2.4 Biobase_2.34.0 glue_1.2.0 R6_2.2.2 AnnotationDbi_1.36.2 XML_3.98-1.11
[37] blob_1.1.1 tidyr_0.8.1 purrr_0.2.4 magrittr_1.5 BiocGenerics_0.20.0 scales_0.5.0
[43] assertthat_0.2.0 colorspace_1.3-2 RCurl_1.95-4.10 lazyeval_0.2.1 munsell_0.4.3
>
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
I checked each one of my abundance.h5 files (384 total), and none of them seem to be the obvious offender. Is there anything obvious I missed that is preventing my analysis?
Thank you!
@cajames2, a few questions: 1) what version of sleuth are you running? Version 0.29.0 could be the current master version or the devel version, and it will help to know what you're working with.
2) did you run the suggested code from brucemoran above? Did that identify any samples with an unexpected dimension?
3) if the answer is 'no', what is the RAM available for your computer? It is possible that 384 samples (which is a lot) is too much for your system to handle at once, and the cryptic error message is indicating that your machine ran out of RAM and swap memory. I know I have worked with a dataset that has 600 samples, and that still uses 60-80 GB of RAM on a machine with 128 GB. If you're working off of a laptop, that is likely the issue.
4) if RAM is not the problem and none of your kallisto files are corrupted, that's when we'll have to explore exactly what happened. There is probably a way for us to run the "reading in kallisto files" step of sleuth_prep
while still keeping track of which file we're reading.
@lydiarck: sorry for the delayed response. It seems like in your situation, something is failing with kallisto or with your script. Depending on how exactly you're running the script, you might also be running into a memory issue that is causing certain kallisto runs to fail. Did you see anything suspicious with the log messages, or with the auxiliary files accompanying the corrupted runs?
@warrenmcg: Thanks for your quick reply. I am using package ‘sleuth’ version 0.29.0. When I run the code suggested by brucemoran, each one of my .h5 files returns an error. This makes me think there may have been an issue with the initial kallisto run. However, I spot checked some of the abundance.tsv files and they are populated, so in practicality the kallisto run worked as expected.
An example:
> apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })
Error in try(dim(h5ls(paste0(f[3], "/abundance.h5")))[1]) :
could not find function "h5ls"
[1]"../output/Plate1A01 -> ERROR"
But, the abundance.tsv file for this sample has shows the transcript ids that aligned faithfully to my data set for that sample.
For what it's worth, when I ran kallisto on the .fastq.gz file of my entire data set, my computer could not handle it. To get around that, I unzipped the file and demultiplexed all my samples and wrote a loop so that kallisto would run on each sample individually. It took about 8 hours but seemed to work fine. Do you think that maybe this was the issue? If not, I'm inclined to think my computer might not have sufficient RAM to handle this data set.
Thanks for all your help.
@cajames2, The problem is not with your files, but with the rhdf5
package and the h5ls
function.
I would make sure these lines work:
library(rhdf5)
?h5ls
If they don't work, that's the problem. Once those lines work, try repeating the suggested code above.
In the meantime, my suspicion is that your computer can't handle the dataset on its own with the available RAM. This will be especially true if you're handling 384 samples while also sending data out to multiple cores. Because of how R does forking, a full copy of all data currently in the R workspace will be sent to each worker, and so RAM can balloon quite a lot if you have a lot of data already present. Unfortunately, not much we can do about that...
To confirm that RAM is the issue, I would pull the activity monitor up and watch your RAM usage while the sleuth run is going. You could try processing the bootstraps using just one core -- it will take a while, but it may have a better chance of succeeding.
Also experiencing the original error:
reading in kallisto results
dropping unused factor levels
.Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibilty. Unable to open file.
I believe the hdf5 files are corrupted and this has nothing to do with sleuth but here is the requested info.
This happens with gcc 4.4.4
gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
and 7.3.1
gcc --version
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE
Using sleuth 0.30.0
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.10 (Santiago)
Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] sleuth_0.30.0 DT_0.4 limma_3.36.5
[4] Biobase_2.40.0 BiocGenerics_0.26.0 biomaRt_2.36.1
[7] ggplot2_3.0.0 XCIR_0.1.25 PSUmisc_0.0.11
[10] data.table_1.11.8
Here is the kallisto 0.44.0
command used to generate the hdf5 files
kallisto quant -t 20 -i kal_idx samp_1.fastq samp_2.fastq -o samp_out -b 100
Now, sorting the runs by the size of their abundance.h5 file and running sleuth_prep
file by file:
less
), all successful files have an "<89>HDF" tag at the top while all the tested error file don't.So my take is that sleuth is fine and the hdf5 files are simply corrupted. This is supported by running kallisto's h5dump
kallisto h5dump samp/abundance.h5 -output-dir="./"
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 0:
#000: H5F.c line 604 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
On my end, I'm thinking this may be the batch system killing jobs which would explain the lack of error reported by kallisto. Looking at some of the scripts in this issue, I suspect some other users may be in the same situation.
Hi @SRenan,
If h5dump
is not working, then I think your diagnosis that this is related to your batch system is correct. You can confirm this if you are able to successfully run kallisto interactively on one of the problematic samples. If kallisto fails interactively as well, please submit an issue to kallisto here with the details of your set-up and the error.
If it turns out to be an issue with your batch system, consult with the IT team at your institution to see what you can do to monitor your batch jobs. It may be as simple as adding the &> log_file.txt
"redirect all shell output to 'log_file.txt'" command to the end of your kallisto command (see here), or something else depending on your cluster and your script. The most common reason for batch jobs getting killed is miscalculating your RAM and core needs when submitting a job, so they will also be able to troubleshoot with you to see those need to be adjusted when submitting these jobs, or if something else is happening, so that this problem is prevented in the future.
@SRenan If this is happening with h5dump
in kallisto then this is an issue with the HDF5 library. If you want to fix it you need to find the version of the HDF5 that kallisto is linking to with ldd kallisto
, and this library should match with the I would recommend downloading the kallisto binary since that has a working hdf5 library statically compiled.
Hi guys,
I just tried to simply load the file tests/testthat/small_test_data/kallisto/abundance.h5
from this repo with
read_kallisto_h5("testthat/small_test_data/kallisto/abundance.h5")
What I get is:
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
HDF5. Dataset. Read failed.
Error in if (num_bootstrap > 0) { : argument is of length zero
sessionInfo():
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 28 (Workstation Edition)
Matrix products: default
BLAS: /home/johannes/.local/opt/miniconda3/envs/sleuth/lib/R/lib/libRblas.so
LAPACK: /home/johannes/.local/opt/miniconda3/envs/sleuth/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.24.0 bindrcpp_0.2.2 sleuth_0.29.0 dplyr_0.7.6 ggplot2_3.1.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 withr_2.1.2 crayon_1.3.4 assertthat_0.2.0
[5] grid_3.4.1 plyr_1.8.4 R6_2.2.2 gtable_0.2.0
[9] magrittr_1.5 scales_1.0.0 pillar_1.3.0 rlang_0.3.0.1
[13] lazyeval_0.2.1 data.table_1.11.4 Rhdf5lib_1.2.1 tools_3.4.1
[17] glue_1.3.0 purrr_0.2.5 munsell_0.5.0 compiler_3.4.1
[21] pkgconfig_2.0.2 colorspace_1.3-2 tidyselect_0.2.4 bindr_0.1.1
[25] tibble_1.4.2
This was run with sleuth 0.29 from bioconda:
bioconductor-rhdf5 2.24.0 r341hfc679d8_0 bioconda
bioconductor-rhdf5lib 1.2.1 r341h470a237_0 bioconda
bzip2 1.0.6 h470a237_2 conda-forge
ca-certificates 2018.10.15 ha4d7672_0 conda-forge
cairo 1.14.12 he6fea26_5 conda-forge
curl 7.62.0 h74213dd_0 conda-forge
fontconfig 2.13.1 h65d0f4c_0 conda-forge
freetype 2.9.1 h6debe1e_4 conda-forge
gettext 0.19.8.1 h5e8e0c9_1 conda-forge
glib 2.55.0 h464dc38_2 conda-forge
graphite2 1.3.12 hfc679d8_1 conda-forge
gsl 2.1 2 conda-forge
harfbuzz 1.9.0 h08d66d9_0 conda-forge
icu 58.2 hfc679d8_0 conda-forge
jpeg 9c h470a237_1 conda-forge
krb5 1.16.2 hbb41f41_0 conda-forge
libcurl 7.62.0 hbdb9355_0 conda-forge
libedit 3.1.20170329 0 conda-forge
libffi 3.2.1 hfc679d8_5 conda-forge
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 7.2.0 hdf63c60_3 conda-forge
libgfortran 3.0.0 1 conda-forge
libiconv 1.15 h470a237_3 conda-forge
libpng 1.6.34 ha92aebf_2 conda-forge
libssh2 1.8.0 h5b517e9_3 conda-forge
libstdcxx-ng 7.2.0 hdf63c60_3 conda-forge
libtiff 4.0.9 he6b73bb_2 conda-forge
libuuid 2.32.1 h470a237_2 conda-forge
libxcb 1.13 h470a237_2 conda-forge
libxml2 2.9.8 h422b904_5 conda-forge
ncurses 5.9 10 conda-forge
openssl 1.0.2p h470a237_1 conda-forge
pango 1.40.14 he752989_2 conda-forge
pcre 8.39 0 conda-forge
pixman 0.34.0 h470a237_3 conda-forge
pthread-stubs 0.4 h470a237_1 conda-forge
r-assertthat 0.2.0 r341h6115d3f_1 conda-forge
r-base 3.4.1 4 conda-forge
r-bh 1.66.0_1 r341_1001 conda-forge
r-bindr 0.1.1 r341h6115d3f_1 conda-forge
r-bindrcpp 0.2.2 r341h9d2a408_1 conda-forge
r-cli 1.0.0 r341h6115d3f_1 conda-forge
r-colorspace 1.3_2 r341hc070d10_2 conda-forge
r-crayon 1.3.4 r341h6115d3f_1 conda-forge
r-data.table 1.11.4 r341hc070d10_2 conda-forge
r-digest 0.6.18 r341hc070d10_0 conda-forge
r-dplyr 0.7.6 r341h9d2a408_1 conda-forge
r-fansi 0.3.0 r341hc070d10_0 conda-forge
r-ggplot2 3.1.0 r341h6115d3f_0 conda-forge
r-glue 1.3.0 r341h470a237_2 conda-forge
r-gtable 0.2.0 r341h6115d3f_1 conda-forge
r-htmltools 0.3.6 r341hfc679d8_2 conda-forge
r-httpuv 1.4.5 r341hfc679d8_1 conda-forge
r-jsonlite 1.5 r341hc070d10_2 conda-forge
r-labeling 0.3 r341h6115d3f_1 conda-forge
r-later 0.7.3 r341h9d2a408_0 conda-forge
r-lattice 0.20_35 r341hc070d10_0 conda-forge
r-lazyeval 0.2.1 r341hc070d10_2 conda-forge
r-magrittr 1.5 r341h6115d3f_1 conda-forge
r-mass 7.3_50 r341hc070d10_2 conda-forge
r-matrix 1.2_14 r341hc070d10_2 conda-forge
r-matrixstats 0.54.0 r341hc070d10_0 conda-forge
r-mgcv 1.8_24 r341hc070d10_2 conda-forge
r-mime 0.5 r341hc070d10_2 conda-forge
r-munsell 0.5.0 r341h6115d3f_1 conda-forge
r-nlme 3.1_137 r341h364d78e_0 conda-forge
r-pillar 1.3.0 r341h6115d3f_0 conda-forge
r-pkgconfig 2.0.2 r341h6115d3f_1 conda-forge
r-plogr 0.2.0 r341h6115d3f_1 conda-forge
r-plyr 1.8.4 r341h9d2a408_2 conda-forge
r-praise 1.0.0 r341h6115d3f_1 conda-forge
r-promises 1.0.1 r341h9d2a408_0 conda-forge
r-purrr 0.2.5 r341hc070d10_1 conda-forge
r-r6 2.2.2 r341h6115d3f_1 conda-forge
r-rcolorbrewer 1.1_2 r341h6115d3f_1 conda-forge
r-rcpp 1.0.0 r341h9d2a408_0 conda-forge
r-reshape2 1.4.3 r341h9d2a408_2 conda-forge
r-rlang 0.3.0.1 r341h470a237_0 conda-forge
r-scales 1.0.0 r341h9d2a408_1 conda-forge
r-shiny 1.2.0 r341_0 conda-forge
r-sleuth 0.29.0 r3.4.1_0 bioconda
r-sourcetools 0.1.7 r341hfc679d8_0 conda-forge
r-stringi 1.2.4 r341h9d2a408_1 conda-forge
r-stringr 1.3.1 r341h6115d3f_1 conda-forge
r-testthat 2.0.1 r341h9d2a408_0 conda-forge
r-tibble 1.4.2 r341hc070d10_2 conda-forge
r-tidyr 0.8.1 r341h9d2a408_2 conda-forge
r-tidyselect 0.2.4 r341h9d2a408_2 conda-forge
r-utf8 1.1.4 r341hc070d10_0 conda-forge
r-viridislite 0.3.0 r341h6115d3f_1 conda-forge
r-withr 2.1.2 r341h6115d3f_0 conda-forge
r-xtable 1.8_3 r341_1000 conda-forge
readline 7.0 0 conda-forge
tk 8.6.9 ha92aebf_0 conda-forge
xorg-kbproto 1.0.7 h470a237_2 conda-forge
xorg-libice 1.0.9 h470a237_4 conda-forge
xorg-libsm 1.2.3 h8c8a85c_0 conda-forge
xorg-libx11 1.6.6 h470a237_0 conda-forge
xorg-libxau 1.0.8 h470a237_6 conda-forge
xorg-libxdmcp 1.1.2 h470a237_7 conda-forge
xorg-libxext 1.3.3 h470a237_4 conda-forge
xorg-libxrender 0.9.10 h470a237_2 conda-forge
xorg-renderproto 0.11.1 h470a237_2 conda-forge
xorg-xextproto 7.3.0 h470a237_2 conda-forge
xorg-xproto 7.0.31 h470a237_7 conda-forge
xz 5.2.4 h470a237_1 conda-forge
zlib 1.2.11 h470a237_3 conda-forge
When directly using rhdf5, I get the same error. When using system hdfview, all is fine. Hence, I think the problem is with the rhdf5 package from Bioconda. Question is whether a it is the specific version of something in the build.
Ok, turns out that when I create an HDF5 file with rhdf5 itself, I can afterwards also read it. Maybe an incompatibility between different versions/systems where files have been created?
Hi @johanneskoester,
If you load rhdf5
in R on your machine, what is the output if you do rhdf5::h5version()
? I know that the test abundance.h5
file was built with HDF5 Version 1.8.11. You can confirm this yourself by running the following code on the command line:
h5cc -showconfig testthat/small_test_data/kallisto/abundance.h5 | head -6 | tail -1
If there is a mismatch in the versions (especially if the kallisto version is more recent than the rhdf5 version), this is the likely cause of the error.
Thanks for responding! So, the output is:
rhdf5::h5version()
This is Bioconductor rhdf5 2.24.0 linking to C-library HDF5 1.8.19
Whereas abundance.h5 seems indeed to be version 1.8.20 (not 1.8.11 as you thought):
h5cc -showconfig testthat/small_test_data/kallisto/abundance.h5 | head -6 | tail -1
HDF5 Version: 1.8.20
Very disappointing to see such incompatibilities in such a mature thing as HDF5...
Even an incompatibility in a patch release!? So, maybe I should ensure that kallisto and rhdf5 are pointing to the very same hdf5 version in Bioconda.
Hi @johanneskoester,
I forgot that the test dataset is made when the package is installed, so it will depend on the version of kallisto you have (when I added the test dataset to the repo, I used an older version of kallisto).
The HDF5 team try to maintain a guarantee about backward compatibility (that later versions of the library can read any files made with earlier versions), but it is difficult to maintain forward compatibility (an older version of the library reading files made with newer versions). We have seen this issue before (#175), and since we have now seen this independently by two users, I am wondering if it at the very least we could help users understand the nature of this error. I'm going to open up a feature request for kallisto to address this.
Kallisto is bundled with a particular version of rhdf5, so it would be hard to change that without digging into installation from source code. In the meantime, according to the thread on #175, the creator of rhdf5 said that version 2.24 can be installed and specify any arbitrary hdf5 library, so that might be your best bet. rhdf5 would have to be newer than kallisto.
Finally, for sleuth, we could add error detection code to check if the hdf5 library versions used are incompatible (only in the case where rhdf5 version is older than kallisto version).
Another possibility would be to build kallisto on bioconda with hdf5 <=1.8.19. I'll think about that. I think using the hdf5-agnostic branch of rhdf5 is too early now, as it still seems experimental.
Hi @johanneskoester,
My mistake about prematurely suggesting the hdf5-agnostic branch of rhdf5. I can't speak to the HDF5 library versions used by different versions of kallisto. However, I have done some digging on which HDF5 library versions are being used for different versions of rhdf5. Here's what I have found:
rhdf5 version | Bioconductor release | R version | HDF5 library version | HDF5 release date | note |
---|---|---|---|---|---|
<2.24.0 | ≤3.6 | ≤3.4 | 1.8.7 | May 2011 | up to this version, rhdf5 used its own internal library |
2.24.0 | 3.7 | 3.5 | 1.8.19 | June 2017 | this version and above depend on Rhdf5lib for the HDF5 library |
≥2.26.0 | ≥3.8 | ≥3.5 | ≥1.10.2 | March 2018 | all versions beyond this point use 1.10.x versions of HDF5 |
Version 2.26.0 is the current stable release for rhdf5 on bioconductor.
There were major changes to HDF5 from 1.8.x to 1.10.y, so there are obvious incompatibilities when reading 1.10 files with a 1.8 reader. However, it seems like this HDF5 read error will occur whenever the HDF5 version of the kallisto files is newer than the HDF5 version used by rhdf5. Any decisions about enforcing HDF5 versions on kallisto will depend on whether any features by newer versions are used by kallisto when writing to file. If not, then it would make sense to enforce a low version of HDF5 (1.8.7 seems to have been the version used by rhdf5 as far back as 2011) to maximize compatibility with any reasonable version of rhdf5.
As an update, I did some hunting through the kallisto source code, and cross-referenced possible HDF5-related functions and objects in H5Writer.h/.cpp and h5utils.h/.cpp with the release notes from the HDF group (link).
I did not identify any C++ functions used by kallisto that weren't available in HDF5 version 1.8.7, and none of those functions have undergone major announced changes since.
Hi everyone,
I'm running into the same error when reading the H5 files:
> so <- sleuth_prep(s2c, full_model=d, num_cores=1) reading in kallisto results dropping unused factor levels ..................................................................... normalizing est_counts 10010 targets passed the filter normalizing tpm merging in metadata Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file.
This is my sessionInfo():
> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] rhdf5_2.26.0 bindrcpp_0.2.2 sleuth_0.30.0 edgeR_3.24.1 limma_3.38.3
loaded via a namespace (and not attached): [1] Rcpp_1.0.0 rstudioapi_0.8 bindr_0.1.1 magrittr_1.5 [5] tidyselect_0.2.5 munsell_0.5.0 colorspace_1.3-2 lattice_0.20-38 [9] R6_2.3.0 rlang_0.3.0.1 plyr_1.8.4 dplyr_0.7.8 [13] tools_3.5.1 parallel_3.5.1 grid_3.5.1 data.table_1.11.8 [17] gtable_0.2.0 lazyeval_0.2.1 assertthat_0.2.0 tibble_1.4.2 [21] crayon_1.3.4 BiocManager_1.30.4 tidyr_0.8.2 Rhdf5lib_1.4.2 [25] purrr_0.2.5 ggplot2_3.1.0 glue_1.3.0 compiler_3.5.1 [29] pillar_1.3.0 scales_1.0.0 locfit_1.5-9.1 pkgconfig_2.0.2
This is my version of gcc:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.1) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
This is my version of rhdf5:
> rhdf5::h5version() This is Bioconductor rhdf5 2.26.0 linking to C-library HDF5 1.10.3
I'm using RStudio on OS X.
I checked my H5 files from kallisto and they are all populated and have the same dimensions. Do you have any suggestions?
Thank you.
Hi @shirlicohen,
What is the output of h5cc -showconfig [[abundance.h5 file]] | head -6 | tail -1
?
The output is HDF5 Version: 1.10.1
Hi,
I am also having the same issue:
so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g, num_cores=1) reading in kallisto results dropping unused factor levels .Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file.
This is my sessioninfo:
sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rhdf5_2.24.0 BiocInstaller_1.30.0 biomaRt_2.36.1 dplyr_0.7.8 sleuth_0.30.0 usethis_1.4.0
[7] devtools_2.0.1
loaded via a namespace (and not attached):
[1] progress_1.2.0 tidyselect_0.2.5 remotes_2.0.2 purrr_0.2.5 colorspace_1.3-2 stats4_3.5.1
[7] base64enc_0.1-3 blob_1.1.1 XML_3.98-1.16 rlang_0.3.0.1 pkgbuild_1.0.2 pillar_1.3.0
[13] glue_1.3.0 withr_2.1.2 DBI_1.0.0 BiocGenerics_0.26.0 bit64_0.9-7 sessioninfo_1.1.1
[19] bindrcpp_0.2.2 bindr_0.1.1 plyr_1.8.4 stringr_1.3.1 munsell_0.5.0 gtable_0.2.0
[25] memoise_1.1.0 Biobase_2.40.0 IRanges_2.14.12 callr_3.0.0 ps_1.2.1 curl_3.2
[31] parallel_3.5.1 AnnotationDbi_1.42.1 Rcpp_1.0.0 scales_1.0.0 backports_1.1.2 S4Vectors_0.18.3
[37] desc_1.2.0 pkgload_1.0.2 fs_1.2.6 bit_1.1-14 hms_0.4.2 ggplot2_3.1.0
[43] digest_0.6.18 stringi_1.2.4 processx_3.2.1 grid_3.5.1 rprojroot_1.3-2 cli_1.0.1
[49] tools_3.5.1 bitops_1.0-6 magrittr_1.5 lazyeval_0.2.1 RCurl_1.95-4.11 tibble_1.4.2
[55] RSQLite_2.1.1 crayon_1.3.4 pkgconfig_2.0.2 data.table_1.11.8 prettyunits_1.0.2 httr_1.3.1
[61] assertthat_0.2.0 rstudioapi_0.8 Rhdf5lib_1.2.1 R6_2.3.0 compiler_3.5.1
This is gcc: Apple LLVM version 10.0.0 (clang-1000.10.44.4)
os: High sierra version 10.13.4
This is the version of rhdf5:
rhdf5::h5version() This is Bioconductor rhdf5 2.24.0 linking to C-library HDF5 1.8.19
Any help is appreciated.
@warrenmcg using a kallisto built against an older hdf5 than rhdf5 does produce the same error.
Seems like the issue is rather a bug in rhdf5 2.24 or the bioconda build thereof. It is gone with 2.26. PR bioconda/bioconda-recipes#12597 will update the package so that the error hopefully disappears.
Hi @johanneskoester,
User @shirlicohen is having the same issue with rhdf5 2.26 (HDF 1.10.3) reading kallisto 0.44.0 (HDF5 1.10.1).
Would both of you mind sending me a kallisto abundance.h5 file that is causing this error so I can reproduce the error and do troubleshooting on my end? My contact is on my main GitHub page.
I have further looked into this, and I now believe the reason is that rhdf5lib was build without zlib support. I am working on another fix for the bioconda rhdf5lib package.
@johanneskoester: do you mind elaborating on why this would lead to the bug in question, as this is well outside my comfort zone now? and would it make sense to submit an issue report for rhdf5lib?
@warrenmcg here is my theory: kallisto stores e.g. the bootstrap values with gz filter applied (already checked). If the reading rhdflib does not find its zlib, it fails with the reported error. I am still working on getting the zlib dependency to work properly though.
Some users have reported having issues reading the H5 files.
Here is the error:
I would like to track this down so if you are having this issue please respond with the following:
gcc --version
And any other information you think might be informative.
Thanks,
Harold