pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

Issue reading h5 files #120

Open pimentel opened 7 years ago

pimentel commented 7 years ago

Some users have reported having issues reading the H5 files.

Here is the error:

> so <- sleuth_prep(s2c, ~ condition)
reading in kallisto results
..Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.

I would like to track this down so if you are having this issue please respond with the following:

And any other information you think might be informative.

Thanks,

Harold

rachelzoeb commented 7 years ago

Hi Harold,

I am getting this error as well.

sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.12.5 (unknown)

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.1 sleuth_0.29.0 dplyr_0.7.0 ggplot2_2.2.1

loaded via a namespace (and not attached): [1] Rcpp_0.12.11 tidyr_0.6.3 assertthat_0.2.0 grid_3.3.0 plyr_1.8.4 R6_2.2.1 gtable_0.2.0 magrittr_1.5 scales_0.4.1 zlibbioc_1.16.0
[11] rlang_0.1.1 lazyeval_0.2.0 data.table_1.10.4 tools_3.3.0 glue_1.0.0 munsell_0.4.3 parallel_3.3.0 rhdf5_2.14.0 pkgconfig_2.0.1 colorspace_1.3-2 [21] bindr_0.1 tibble_1.3.3

Operating system: macOS Sierra version 10.12.5 RStudio Version 1.0.136 gcc version Apple LLVM version 8.1.0 (clang-802.0.42)

Hope this helps!

Rachel

jmcribeiro commented 7 years ago

Hi Harold and Rachel,

Same problem here.

so <- sleuth_prep(s2c, full_model = full_design) reading in kallisto results dropping unused factor levels ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata Error in H5Fopen(file, "H5F_ACC_RDONLY") : HDF5. File accessability. Unable to open file.

#########################################################

R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows Server 2008 R2 x64 (build 7601) Service Pack 1

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] splines stats graphics grDevices utils datasets methods base

other attached packages: [1] sleuth_0.29.0 dplyr_0.5.0 ggplot2_2.2.1 BiocInstaller_1.24.0

loaded via a namespace (and not attached): [1] Rcpp_0.12.11 magrittr_1.5 zlibbioc_1.20.0 devtools_1.13.2
[5] munsell_0.4.3 colorspace_1.3-2 R6_2.2.1 rlang_0.1.1
[9] httr_1.2.1 plyr_1.8.4 tools_3.3.2 parallel_3.3.2
[13] grid_3.3.2 rhdf5_2.18.0 data.table_1.10.4 gtable_0.2.0
[17] DBI_0.6-1 git2r_0.18.0 withr_1.0.2 lazyeval_0.2.0
[21] digest_0.6.12 assertthat_0.2.0 tibble_1.3.3 tidyr_0.6.3
[25] curl_2.6 memoise_1.1.0 scales_0.4.1

############################################################# I tried another drive using setwd() withouth success I also tried

jmcribeiro commented 7 years ago

...continuing...

options(max.print=10000000)

also without success.

Regards,

Jose

jmcribeiro commented 7 years ago

Hi,

I run the same script on a linux machine. This time I got errors/warnings, including:

1: In read_kallisto(path, read_bootstrap = TRUE, max_bootstrap = max_bootstrap) : You specified to read bootstraps, but we won't do so for plaintext

Indeed I have run kallisto with the --plain-text option

Now I am re-running kallisto withouth the option, and we will see what happens.

Perhaps the R versions of sleuth on Mac and Windows are not reporting the errors/warnings above.

Regards,

Jose

jmcribeiro commented 7 years ago

Hi,

I rerun kallisto without the --plain-text option.

Now the .h5 files were created in the expected subdirectories, it was not there before.

when running the command

so <- sleuth_prep(s2c, full_model = full_design)

on a Windows machine I now get

reading in kallisto results dropping unused factor levels ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata summarizing bootstraps Error in parallel::mclapply(x, y, mc.cores = num_cores) : 'mc.cores' > 1 is not supported on Windows ###################################################

On looking the documentation on sleuth_prep at

https://pachterlab.github.io/sleuth/docs/sleuth_prep.html

SUGGESTION 1: I cannot find an option to limit the script to a single core, so I suggest that that is either included as a switch on sleuth_prep or that a new version of the function takes into account the environment and indicates how many cpu's to use.

SUGGESTION 2:

On

https://pachterlab.github.io/kallisto/manual

The text

Optional arguments: --bias Perform sequence based bias correction -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 --fusion Search for fusions for Pizzly

could be changed to (ADDED TEXT IN BOLD).

Optional arguments: --bias Perform sequence based bias correction -b, --bootstrap-samples=INT Number of bootstrap samples (default: 0) --seed=INT Seed for the bootstrap sampling (default: 42) --plaintext Output plaintext instead of HDF5 (NOT COMPATIBLE WITH SLEUTH) --fusion Search for fusions for Pizzly

On the other hand, running the same script on a Linux machine after rerunning kallisto I got no error messages! Bingo!

so <- sleuth_prep(s2c, full_model = full_design) reading in kallisto results ........................................................................ normalizing est_counts 72457 targets passed the filter normalizing tpm merging in metadata normalizing bootstrap samples summarizing bootstraps

Regards,

Jose

warrenmcg commented 7 years ago

@jmcribeiro, it appears the documentation on the website is not up-to-date with the current version, as it is run separately, so it doesn't have the new options. If you go into R and do ?sleuth_prep you'll see the most up-to-date documentation.

The option you want for sleuth_prep is num_cores. So so <- sleuth_prep(s2c, full_model = full_design, num_cores = 1).

jmcribeiro commented 7 years ago

Hi Warren,

Thanks for your comment. your recommendation worked! Thanks.

Please see my recommendation to make sure R in windows flags the plain-text flag as well to avoid other users getting lost.

Regards,

Jose

warrenmcg commented 7 years ago

Hello!

Those are two great suggestions.

For the Windows issue, we can set a quick patch to warn users that Windows does not support mclapply and switch num_cores to 1. Moving forward, we can explore switching to using the future package, which would allow Windows users to operate multiple cores too.

For the text files issue, I wonder if this is the reason most people are having issues? I think it would make sense for sleuth_prep to check for abundance.tsv files if the abundance.h5 is absent, and use the appropriate read method.

What do you think @pimentel of these two options?

rachelzoeb commented 7 years ago

I have rerun kallisto and removed the --plain-text flag which removed the h5 error. However, now I get this error:

.Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../analysis/data/kallisto/Deer_R1_S22/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".

Any help is greatly appreciated,

Rachel

warrenmcg commented 7 years ago

Hi @rachelzoeb,

What was your full kallisto command, and what version of kallisto did you use?

It seems that you did not use the -b option when running kallisto, which is a requirement to take full advantage of sleuth.

If you did use the -b option with this error, maybe there is a something wrong with how sleuth is interacting with your particular version of kallisto.

If you are using the latest version of kallisto, then it would be helpful if you gave your OS and version of gcc (use gcc --version) as Harold suggested above, and emailed your abundance.h5 file to him or posted it here for me and other users to look at to help you out.

pimentel commented 7 years ago

@warrenmcg thanks so much for fielding these questions.

regarding the windows patch: that sounds like a great idea

Unfortunately the bootstraps are not available via plaintext at all. This is because H5 provides nice compression that is a bit of a pain to get otherwise. Initially, plaintext abundance.tsv was only intended for quick sanity checks. However, we have been discussing changing the format to remove the dependency on H5 which has proven to be an issue for some time now...

More on this soon.

xindiguo commented 7 years ago

Hi,

I ran kallisto with quant --bootstrap-samples=100 --threads=16 and 4 out of 8 of my h5 files had the can-not-open error. My kallisto ran on a linux server and then I downloaded the h5 files to my local machine (mac OS) to run sleuth in R. Do you think there might be an error during the file transfer? Also, I have checked the file size of the error h5 and for 3 out of 4 files have the error, the h5 file size is smaller than the tsv file. Not sure it is related. Thanks in advance for any help!

sessionInfo() in R -

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.20.0         BiocInstaller_1.26.0 bindrcpp_0.2         synapseClient_1.15-0 sleuth_0.29.0       
[6] dplyr_0.7.2          ggplot2_2.2.1        biomaRt_2.32.1      

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.12         compiler_3.4.1       plyr_1.8.4           bindr_0.1            zlibbioc_1.22.0     
 [6] bitops_1.0-6         tools_3.4.1          digest_0.6.12        bit_1.1-12           RSQLite_2.0         
[11] memoise_1.1.0        tibble_1.3.4         gtable_0.2.0         pkgconfig_2.0.1      rlang_0.1.2         
[16] DBI_0.7              parallel_3.4.1       stringr_1.2.0        S4Vectors_0.14.3     IRanges_2.10.2      
[21] stats4_3.4.1         bit64_0.9-7          grid_3.4.1           glue_1.1.1           Biobase_2.36.2      
[26] data.table_1.10.4    R6_2.2.2             AnnotationDbi_1.38.2 XML_3.98-1.9         tidyr_0.7.0         
[31] reshape2_1.4.2       blob_1.1.0           magrittr_1.5         matrixStats_0.52.2   scales_0.5.0        
[36] BiocGenerics_0.22.0  assertthat_0.2.0     colorspace_1.3-2     stringi_1.1.5        RCurl_1.95-4.8      
[41] lazyeval_0.2.0       munsell_0.4.3        rjson_0.2.15        
> 

OS - kallisto was ran on 3.2.0-29-generic GNU/Linux sleuth was ran in R on macOS Sierra version 10.12.5

gcc -

$ gcc --version
Apple LLVM version 8.1.0 (clang-802.0.42)
miguelroboso commented 7 years ago

I am currently having this issue.

I have a data frame built as it is in the walkthrough, and it looks like this:

    sample condition           path
 1:     P1        ns  expression/P1
 2:     P2        ns  expression/P2
 3:     P3         s  expression/P3
 4:     P4         s  expression/P4
 5:     P5        ns  expression/P5
 6:     P6        ns  expression/P6
 7:     P7         s  expression/P7
 8:     P8         s  expression/P8
 9:     P9        ns  expression/P9
10:    P10        ns expression/P10
11:    P11         s expression/P11
12:    P12         s expression/P12
>sessionInfo() 
R version 3.3.1 (2016-06-21) 
Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final)  locale: 
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8         
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8     
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C         

attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base       

other attached packages: 
[1] bindrcpp_0.2        sleuth_0.29.0       dplyr_0.7.4         ggplot2_2.2.1       edgeR_3.16.5       
 [6] biomaRt_2.30.0      limma_3.30.13       data.table_1.10.4-3  

loaded via a namespace (and not attached):  
[1] locfit_1.5-9.1       tidyselect_0.2.2     purrr_0.2.4          lattice_0.20-34      
[5] rhdf5_2.18.0         colorspace_1.3-2     htmltools_0.3.6      stats4_3.3.1          
[9] viridisLite_0.2.0    yaml_2.1.14          base64enc_0.1-3      XML_3.98-1.7         [13] plotly_4.7.1         rlang_0.1.2          glue_1.1.1           DBI_0.6-1            
[17] BiocGenerics_0.20.0  bindr_0.1            plyr_1.8.4           stringr_1.2.0        [21] zlibbioc_1.20.0      munsell_0.4.3        gtable_0.2.0         htmlwidgets_0.9     
 [25] memoise_1.1.0        evaluate_0.10        Biobase_2.34.0       knitr_1.15.1         
[29] IRanges_2.8.2        parallel_3.3.1       AnnotationDbi_1.36.2 Rcpp_0.12.13         
[33] scales_0.5.0         backports_1.1.0      S4Vectors_0.12.2     jsonlite_1.5        
 [37] digest_0.6.12        stringi_1.1.5        grid_3.3.1           rprojroot_1.2        [41] tools_3.3.1          bitops_1.0-6         magrittr_1.5         lazyeval_0.2.0      
 [45] RCurl_1.95-4.8       tibble_1.3.4         RSQLite_1.1-2        tidyr_0.7.2          [49] pkgconfig_2.0.1      assertthat_0.1       rmarkdown_1.6        httr_1.2.1           
[53] R6_2.2.2

OS is CENTOS, 2.6.32-696.10.2.el6.x86_64

bash-4.1$ gcc --version
gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

However, I don't know why it is trying to read H5 files. In the expression directories I only have tsv files (ran kallisto with --plain-text output)

Lastly, the error is:

It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.reading in kallisto results
dropping unused factor levels
............
normalizing est_counts
59202 targets passed the filter
normalizing tpm
merging in metadata
Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.
Sames-Jtudd commented 7 years ago

Hi had the same problem and solved it.

The issue was due to the file structure I was using. Clearly this may not be the issue for everybody.

When setting up the kr_dirs data frame as per the instructions (https://pachterlab.github.io/sleuth_walkthroughs/trapnell/analysis.html)

the program assumes that each sample is found it own directory which has both the

abundance.tsv and abundance.h5

with the file name unedited. When I arranged the files like this the error was not tripped.

hope that helps

sarahharvey88 commented 7 years ago

Hello

I am also having the same error message with one of my files (I have 46 and it only seems to be kicking up this one,which I have re-generated by re-running Kallisto)

Error in process_bootstrap(i, samp_name, kal_path, num_transcripts, est_counts_sf[[i]], : File h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".File ../quant/WTCHG_412393_006/abundance.h5 has no bootstraps.Please generate bootstraps using "kallisto quant -b".

However, I used 100 bootstraps when I ran Kallisto and and when I look at the run info file also produced by kallisto it confirms this for this sample.

{ "n_targets": 60054, "n_bootstraps": 100, "n_processed": 15681904, "kallisto_version": "0.43.1", "index_version": 10, "start_time": "Wed Nov 22 12:14:52 2017", "call": "kallisto quant -i transcripts.idx -o quant/WTCHG_403319_006 -b 100 ../../data/reads/WTCHG_403319_006_1.fastq.gz ../../data/reads/WTCHG_403319_006_2.fastq.gz" }

My sleuth prep command is this: so <- sleuth_prep(sample_to_condition, target_mapping = ttg, aggregation_column = 'gene_id', extra_bootstrap_summary = TRUE, num_cores=1)

Any help appreciated! I used Kallisto v0.43.1 on our uni Linux server then am running Sleuth (latest version) on my macbook.

Sarah

warrenmcg commented 7 years ago

@sarahharvey88, that is odd. Could you send the problematic h5 file so I can reproduce the error on my side? Email me at:

warren-mcgee at fsm.northwestern.edu (replace at with @ and remove spaces)

warrenmcg commented 7 years ago

@miguelroboso, as has been mentioned previously, the plain text files do not have the bootstraps included. You should rerun kallisto without the --plaintext option included. The error you are seeing is because there is a line within sleuth_prep that expects an h5 file to be present.

pinging @pimentel: the offending line causing Miguel's user-unfriendly error is this one. The current version expects an H5 file to be present, so should we be more explicit about that requirement in sleuth_prep?

brucemoran commented 6 years ago

Also get this error. NB samples were run using Nextflow and executed by PBS/Torque. When I rerun the offending samples 'interactively' they all work. Not ideal though...

Kallisto command:

kallisto quant \
-l ${params.fragment_len} \
-s ${params.fragment_sd} \
-b ${params.bootstrap} \
-i ${index} \
-t ${task.cpus} \
-o ./ \
${reads1} ${reads2}
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/software/R/3.4.0/lib64/R/lib/libRblas.so
LAPACK: /apps/software/R/3.4.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_IE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_IE.UTF-8        LC_COLLATE=en_IE.UTF-8
 [5] LC_MONETARY=en_IE.UTF-8    LC_MESSAGES=en_IE.UTF-8
 [7] LC_PAPER=en_IE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] bindrcpp_0.2   rhdf5_2.20.0   biomaRt_2.32.1 sleuth_0.29.0  dplyr_0.7.4
[6] ggplot2_2.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13         compiler_3.4.0       plyr_1.8.4
 [4] bindr_0.1            zlibbioc_1.22.0      bitops_1.0-6
 [7] digest_0.6.12        bit_1.1-12           RSQLite_2.0
[10] memoise_1.1.0        tibble_1.3.4         gtable_0.2.0
[13] pkgconfig_2.0.1      rlang_0.1.2          DBI_0.7
[16] parallel_3.4.0       IRanges_2.10.5       S4Vectors_0.14.7
[19] stats4_3.4.0         bit64_0.9-7          grid_3.4.0
[22] glue_1.1.1           data.table_1.10.4-2  Biobase_2.36.2
[25] R6_2.2.2             AnnotationDbi_1.38.2 XML_3.98-1.9
[28] blob_1.1.0           magrittr_1.5         scales_0.5.0
[31] BiocGenerics_0.22.1  assertthat_0.2.0     colorspace_1.3-2
[34] RCurl_1.95-4.8       lazyeval_0.2.0       munsell_0.4.3
cat /etc/*-release | head -n1
CentOS Linux release 7.3.1611 (Core)
gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
brucemoran commented 6 years ago

NB to find offending h5 files, you can use h5ls(<path/to/abundance.h5>). From this it seems that dim(h5ls(<path/to/abundance.h5>)) should be 115. So using something like below will show those samples that fail.

apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })

TBradley27 commented 6 years ago

Hello,

I am experiencing a similar problem.

sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 16.04.3 LTS

Matrix products: default

BLAS: /usr/lib/libblas/libblas.so.3.6.0

LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       

 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   

 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              

[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:

[1] bindrcpp_0.2  sleuth_0.29.0 dplyr_0.7.4   ggplot2_2.2.1

loaded via a namespace (and not attached):

 [1] Rcpp_0.12.15        rstudioapi_0.7      bindr_0.1           magrittr_1.5       

 [5] zlibbioc_1.24.0     devtools_1.13.4     munsell_0.4.3       colorspace_1.3-2   

 [9] R6_2.2.2            rlang_0.1.6         plyr_1.8.4          tools_3.4.3        

[13] parallel_3.4.3      grid_3.4.3          rhdf5_2.22.0        data.table_1.10.4-3

[17] gtable_0.2.0        utf8_1.1.3          cli_1.0.0           withr_2.1.1        

[21] lazyeval_0.2.1      assertthat_0.2.0    digest_0.6.14       tibble_1.4.2       

[25] crayon_1.3.4        memoise_1.1.0       glue_1.2.0          compiler_3.4.3     

[29] pillar_1.1.0        scales_0.5.0        pkgconfig_2.0.1 

Operating System:

Linux ubuntu 4.13.0-32-generic x86_64 GNU/Linux

GCC version:

gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609

The R instance is being run within a virtual machine hosted by a Windows OS, but I am not sure if that tells you anything or not.

lydiarck commented 6 years ago

I get a slightly different H5-related error message:

Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, : HDF5. Dataset. Read failed.

Like Bruce's experience above, it only happens for some of my files, and if I re-run kallisto interactively for these files (instead of from a shell script), the resulting files can be read using sleuth with no issues.

> sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /usr/analysis/src/R/R-3.4.3/lib/libRblas.so
LAPACK: /usr/analysis/src/R/R-3.4.3/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2         sleuth_0.29.0        dplyr_0.7.4         
[4] ggplot2_2.2.1        BiocInstaller_1.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15        bindr_0.1           magrittr_1.5       
 [4] zlibbioc_1.24.0     tidyselect_0.2.3    munsell_0.4.3      
 [7] colorspace_1.3-2    R6_2.2.2            rlang_0.1.6        
[10] stringr_1.2.0       plyr_1.8.4          tools_3.4.3        
[13] parallel_3.4.3      grid_3.4.3          rhdf5_2.22.0       
[16] data.table_1.10.4-3 gtable_0.2.0        lazyeval_0.2.1     
[19] assertthat_0.2.0    tibble_1.4.2        reshape2_1.4.3     
[22] purrr_0.2.4         tidyr_0.8.0         glue_1.2.0         
[25] stringi_1.1.6       compiler_3.4.3      pillar_1.1.0       
[28] scales_0.5.0        pkgconfig_2.0.1    
Red Hat Enterprise Linux Server release 6.9 (Santiago)

gcc --version
gcc (GCC) 4.7.4
cajames2 commented 6 years ago

Hello!

I am still receiving this error message:

reading in kallisto results
dropping unused factor levels
....................................................................................................................................................Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file.
In addition: Warning message:
In check_num_cores(num_cores) :
  It appears that you are running Sleuth from within Rstudio.
Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
If you wish to take advantage of multiple cores, please consider running sleuth from the command line.

I am using kallisto 0.44.0. I ran the initial kallisto script using this command:

kallisto quant -i transcripts.idx -o output -b 100 READ1.fastq READ2.fastq

I then tried to run the sleuth_prep command in a couple of ways and got the same error both times.

so <- sleuth_prep(s2c, extra_bootstrap_summary = TRUE) and

> mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
+                          dataset = "hsapiens_gene_ensembl",
+                          host = 'ensembl.org')
> t2g <- biomaRt::getBM(attributes = c("ensembl_transcript_id", "ensembl_gene_id",
+                                      "external_gene_name"), mart = mart)
> t2g <- dplyr::rename(t2g, target_id = ensembl_transcript_id,
+                      ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
> so <- sleuth_prep(s2c, target_mapping = t2g)
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.30.0       hexbin_1.27.1        sleuth_0.29.0        ggplot2_2.2.1        data.table_1.11.2    BiocInstaller_1.24.0
[7] bindrcpp_0.2.2       dplyr_0.7.5         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17         git2r_0.21.0         plyr_1.8.4           bindr_0.1.1          bitops_1.0-6         tools_3.3.0         
 [7] zlibbioc_1.20.0      bit_1.1-13           digest_0.6.15        lattice_0.20-35      RSQLite_2.1.1        memoise_1.1.0       
[13] tibble_1.4.2         gtable_0.2.0         rhdf5_2.18.0         pkgconfig_2.0.1      rlang_0.2.0          DBI_1.0.0           
[19] curl_3.2             yaml_2.1.19          parallel_3.3.0       withr_2.1.2          httr_1.3.1           knitr_1.20          
[25] IRanges_2.8.2        S4Vectors_0.12.2     devtools_1.13.5      bit64_0.9-7          stats4_3.3.0         grid_3.3.0          
[31] tidyselect_0.2.4     Biobase_2.34.0       glue_1.2.0           R6_2.2.2             AnnotationDbi_1.36.2 XML_3.98-1.11       
[37] blob_1.1.1           tidyr_0.8.1          purrr_0.2.4          magrittr_1.5         BiocGenerics_0.20.0  scales_0.5.0        
[43] assertthat_0.2.0     colorspace_1.3-2     RCurl_1.95-4.10      lazyeval_0.2.1       munsell_0.4.3       
> 
$ gcc --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin14.5.0
Thread model: posix

I checked each one of my abundance.h5 files (384 total), and none of them seem to be the obvious offender. Is there anything obvious I missed that is preventing my analysis?

Thank you!

warrenmcg commented 6 years ago

@cajames2, a few questions: 1) what version of sleuth are you running? Version 0.29.0 could be the current master version or the devel version, and it will help to know what you're working with.

2) did you run the suggested code from brucemoran above? Did that identify any samples with an unexpected dimension?

3) if the answer is 'no', what is the RAM available for your computer? It is possible that 384 samples (which is a lot) is too much for your system to handle at once, and the cryptic error message is indicating that your machine ran out of RAM and swap memory. I know I have worked with a dataset that has 600 samples, and that still uses 60-80 GB of RAM on a machine with 128 GB. If you're working off of a laptop, that is likely the issue.

4) if RAM is not the problem and none of your kallisto files are corrupted, that's when we'll have to explore exactly what happened. There is probably a way for us to run the "reading in kallisto files" step of sleuth_prep while still keeping track of which file we're reading.

warrenmcg commented 6 years ago

@lydiarck: sorry for the delayed response. It seems like in your situation, something is failing with kallisto or with your script. Depending on how exactly you're running the script, you might also be running into a memory issue that is causing certain kallisto runs to fail. Did you see anything suspicious with the log messages, or with the auxiliary files accompanying the corrupted runs?

cajames2 commented 6 years ago

@warrenmcg: Thanks for your quick reply. I am using package ‘sleuth’ version 0.29.0. When I run the code suggested by brucemoran, each one of my .h5 files returns an error. This makes me think there may have been an issue with the initial kallisto run. However, I spot checked some of the abundance.tsv files and they are populated, so in practicality the kallisto run worked as expected.

An example:

> apply(s2c,1,function(f){ dh5 <- try(dim(h5ls(paste0(f[3],"/abundance.h5")))[1]); if(dh5!=115){ dh5<-"ERROR" }; return(paste0(f[3]," -> ",dh5)) })

Error in try(dim(h5ls(paste0(f[3], "/abundance.h5")))[1]) : 
  could not find function "h5ls"
 [1]"../output/Plate1A01 -> ERROR"

But, the abundance.tsv file for this sample has shows the transcript ids that aligned faithfully to my data set for that sample.

For what it's worth, when I ran kallisto on the .fastq.gz file of my entire data set, my computer could not handle it. To get around that, I unzipped the file and demultiplexed all my samples and wrote a loop so that kallisto would run on each sample individually. It took about 8 hours but seemed to work fine. Do you think that maybe this was the issue? If not, I'm inclined to think my computer might not have sufficient RAM to handle this data set.

Thanks for all your help.

warrenmcg commented 6 years ago

@cajames2, The problem is not with your files, but with the rhdf5 package and the h5ls function. I would make sure these lines work:

library(rhdf5)
?h5ls

If they don't work, that's the problem. Once those lines work, try repeating the suggested code above.

In the meantime, my suspicion is that your computer can't handle the dataset on its own with the available RAM. This will be especially true if you're handling 384 samples while also sending data out to multiple cores. Because of how R does forking, a full copy of all data currently in the R workspace will be sent to each worker, and so RAM can balloon quite a lot if you have a lot of data already present. Unfortunately, not much we can do about that...

To confirm that RAM is the issue, I would pull the activity monitor up and watch your RAM usage while the sleuth run is going. You could try processing the bootstraps using just one core -- it will take a while, but it may have a better chance of succeeding.

SRenan commented 6 years ago

Also experiencing the original error:

reading in kallisto results
dropping unused factor levels
.Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : 
  HDF5. File accessibilty. Unable to open file.

I believe the hdf5 files are corrupted and this has nothing to do with sleuth but here is the requested info.

This happens with gcc 4.4.4

gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and 7.3.1

gcc --version
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

Using sleuth 0.30.0

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.10 (Santiago)

Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] sleuth_0.30.0       DT_0.4              limma_3.36.5       
 [4] Biobase_2.40.0      BiocGenerics_0.26.0 biomaRt_2.36.1     
 [7] ggplot2_3.0.0       XCIR_0.1.25         PSUmisc_0.0.11     
[10] data.table_1.11.8 

Here is the kallisto 0.44.0 command used to generate the hdf5 files

kallisto quant -t 20 -i kal_idx   samp_1.fastq samp_2.fastq -o samp_out -b 100

Now, sorting the runs by the size of their abundance.h5 file and running sleuth_prep file by file:

  1. The error happens in lower file sizes only
  2. There is no error reported in the kallisto run but most of the failing samples report less EM of the bootstrap than the desired number. While the files that can be read by sleuth consistently report 100 iterations.
  3. I don't know anything about hdf5, but opening the h5 files (less), all successful files have an "<89>HDF" tag at the top while all the tested error file don't.

So my take is that sleuth is fine and the hdf5 files are simply corrupted. This is supported by running kallisto's h5dump

kallisto h5dump samp/abundance.h5 -output-dir="./"
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 0:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file

On my end, I'm thinking this may be the batch system killing jobs which would explain the lack of error reported by kallisto. Looking at some of the scripts in this issue, I suspect some other users may be in the same situation.

warrenmcg commented 6 years ago

Hi @SRenan,

If h5dump is not working, then I think your diagnosis that this is related to your batch system is correct. You can confirm this if you are able to successfully run kallisto interactively on one of the problematic samples. If kallisto fails interactively as well, please submit an issue to kallisto here with the details of your set-up and the error.

If it turns out to be an issue with your batch system, consult with the IT team at your institution to see what you can do to monitor your batch jobs. It may be as simple as adding the &> log_file.txt "redirect all shell output to 'log_file.txt'" command to the end of your kallisto command (see here), or something else depending on your cluster and your script. The most common reason for batch jobs getting killed is miscalculating your RAM and core needs when submitting a job, so they will also be able to troubleshoot with you to see those need to be adjusted when submitting these jobs, or if something else is happening, so that this problem is prevented in the future.

pmelsted commented 6 years ago

@SRenan If this is happening with h5dump in kallisto then this is an issue with the HDF5 library. If you want to fix it you need to find the version of the HDF5 that kallisto is linking to with ldd kallisto, and this library should match with the I would recommend downloading the kallisto binary since that has a working hdf5 library statically compiled.

johanneskoester commented 6 years ago

Hi guys, I just tried to simply load the file tests/testthat/small_test_data/kallisto/abundance.h5 from this repo with

read_kallisto_h5("testthat/small_test_data/kallisto/abundance.h5")

What I get is:

Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem,  : 
  HDF5. Dataset. Read failed.
Error in if (num_bootstrap > 0) { : argument is of length zero

sessionInfo():

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora 28 (Workstation Edition)

Matrix products: default
BLAS: /home/johannes/.local/opt/miniconda3/envs/sleuth/lib/R/lib/libRblas.so
LAPACK: /home/johannes/.local/opt/miniconda3/envs/sleuth/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.24.0   bindrcpp_0.2.2 sleuth_0.29.0  dplyr_0.7.6    ggplot2_3.1.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        withr_2.1.2       crayon_1.3.4      assertthat_0.2.0 
 [5] grid_3.4.1        plyr_1.8.4        R6_2.2.2          gtable_0.2.0     
 [9] magrittr_1.5      scales_1.0.0      pillar_1.3.0      rlang_0.3.0.1    
[13] lazyeval_0.2.1    data.table_1.11.4 Rhdf5lib_1.2.1    tools_3.4.1      
[17] glue_1.3.0        purrr_0.2.5       munsell_0.5.0     compiler_3.4.1   
[21] pkgconfig_2.0.2   colorspace_1.3-2  tidyselect_0.2.4  bindr_0.1.1      
[25] tibble_1.4.2

This was run with sleuth 0.29 from bioconda:

bioconductor-rhdf5        2.24.0           r341hfc679d8_0    bioconda
bioconductor-rhdf5lib     1.2.1            r341h470a237_0    bioconda
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.10.15           ha4d7672_0    conda-forge
cairo                     1.14.12              he6fea26_5    conda-forge
curl                      7.62.0               h74213dd_0    conda-forge
fontconfig                2.13.1               h65d0f4c_0    conda-forge
freetype                  2.9.1                h6debe1e_4    conda-forge
gettext                   0.19.8.1             h5e8e0c9_1    conda-forge
glib                      2.55.0               h464dc38_2    conda-forge
graphite2                 1.3.12               hfc679d8_1    conda-forge
gsl                       2.1                           2    conda-forge
harfbuzz                  1.9.0                h08d66d9_0    conda-forge
icu                       58.2                 hfc679d8_0    conda-forge
jpeg                      9c                   h470a237_1    conda-forge
krb5                      1.16.2               hbb41f41_0    conda-forge
libcurl                   7.62.0               hbdb9355_0    conda-forge
libedit                   3.1.20170329                  0    conda-forge
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libgfortran               3.0.0                         1    conda-forge
libiconv                  1.15                 h470a237_3    conda-forge
libpng                    1.6.34               ha92aebf_2    conda-forge
libssh2                   1.8.0                h5b517e9_3    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
libtiff                   4.0.9                he6b73bb_2    conda-forge
libuuid                   2.32.1               h470a237_2    conda-forge
libxcb                    1.13                 h470a237_2    conda-forge
libxml2                   2.9.8                h422b904_5    conda-forge
ncurses                   5.9                          10    conda-forge
openssl                   1.0.2p               h470a237_1    conda-forge
pango                     1.40.14              he752989_2    conda-forge
pcre                      8.39                          0    conda-forge
pixman                    0.34.0               h470a237_3    conda-forge
pthread-stubs             0.4                  h470a237_1    conda-forge
r-assertthat              0.2.0            r341h6115d3f_1    conda-forge
r-base                    3.4.1                         4    conda-forge
r-bh                      1.66.0_1              r341_1001    conda-forge
r-bindr                   0.1.1            r341h6115d3f_1    conda-forge
r-bindrcpp                0.2.2            r341h9d2a408_1    conda-forge
r-cli                     1.0.0            r341h6115d3f_1    conda-forge
r-colorspace              1.3_2            r341hc070d10_2    conda-forge
r-crayon                  1.3.4            r341h6115d3f_1    conda-forge
r-data.table              1.11.4           r341hc070d10_2    conda-forge
r-digest                  0.6.18           r341hc070d10_0    conda-forge
r-dplyr                   0.7.6            r341h9d2a408_1    conda-forge
r-fansi                   0.3.0            r341hc070d10_0    conda-forge
r-ggplot2                 3.1.0            r341h6115d3f_0    conda-forge
r-glue                    1.3.0            r341h470a237_2    conda-forge
r-gtable                  0.2.0            r341h6115d3f_1    conda-forge
r-htmltools               0.3.6            r341hfc679d8_2    conda-forge
r-httpuv                  1.4.5            r341hfc679d8_1    conda-forge
r-jsonlite                1.5              r341hc070d10_2    conda-forge
r-labeling                0.3              r341h6115d3f_1    conda-forge
r-later                   0.7.3            r341h9d2a408_0    conda-forge
r-lattice                 0.20_35          r341hc070d10_0    conda-forge
r-lazyeval                0.2.1            r341hc070d10_2    conda-forge
r-magrittr                1.5              r341h6115d3f_1    conda-forge
r-mass                    7.3_50           r341hc070d10_2    conda-forge
r-matrix                  1.2_14           r341hc070d10_2    conda-forge
r-matrixstats             0.54.0           r341hc070d10_0    conda-forge
r-mgcv                    1.8_24           r341hc070d10_2    conda-forge
r-mime                    0.5              r341hc070d10_2    conda-forge
r-munsell                 0.5.0            r341h6115d3f_1    conda-forge
r-nlme                    3.1_137          r341h364d78e_0    conda-forge
r-pillar                  1.3.0            r341h6115d3f_0    conda-forge
r-pkgconfig               2.0.2            r341h6115d3f_1    conda-forge
r-plogr                   0.2.0            r341h6115d3f_1    conda-forge
r-plyr                    1.8.4            r341h9d2a408_2    conda-forge
r-praise                  1.0.0            r341h6115d3f_1    conda-forge
r-promises                1.0.1            r341h9d2a408_0    conda-forge
r-purrr                   0.2.5            r341hc070d10_1    conda-forge
r-r6                      2.2.2            r341h6115d3f_1    conda-forge
r-rcolorbrewer            1.1_2            r341h6115d3f_1    conda-forge
r-rcpp                    1.0.0            r341h9d2a408_0    conda-forge
r-reshape2                1.4.3            r341h9d2a408_2    conda-forge
r-rlang                   0.3.0.1          r341h470a237_0    conda-forge
r-scales                  1.0.0            r341h9d2a408_1    conda-forge
r-shiny                   1.2.0                    r341_0    conda-forge
r-sleuth                  0.29.0                 r3.4.1_0    bioconda
r-sourcetools             0.1.7            r341hfc679d8_0    conda-forge
r-stringi                 1.2.4            r341h9d2a408_1    conda-forge
r-stringr                 1.3.1            r341h6115d3f_1    conda-forge
r-testthat                2.0.1            r341h9d2a408_0    conda-forge
r-tibble                  1.4.2            r341hc070d10_2    conda-forge
r-tidyr                   0.8.1            r341h9d2a408_2    conda-forge
r-tidyselect              0.2.4            r341h9d2a408_2    conda-forge
r-utf8                    1.1.4            r341hc070d10_0    conda-forge
r-viridislite             0.3.0            r341h6115d3f_1    conda-forge
r-withr                   2.1.2            r341h6115d3f_0    conda-forge
r-xtable                  1.8_3                 r341_1000    conda-forge
readline                  7.0                           0    conda-forge
tk                        8.6.9                ha92aebf_0    conda-forge
xorg-kbproto              1.0.7                h470a237_2    conda-forge
xorg-libice               1.0.9                h470a237_4    conda-forge
xorg-libsm                1.2.3                h8c8a85c_0    conda-forge
xorg-libx11               1.6.6                h470a237_0    conda-forge
xorg-libxau               1.0.8                h470a237_6    conda-forge
xorg-libxdmcp             1.1.2                h470a237_7    conda-forge
xorg-libxext              1.3.3                h470a237_4    conda-forge
xorg-libxrender           0.9.10               h470a237_2    conda-forge
xorg-renderproto          0.11.1               h470a237_2    conda-forge
xorg-xextproto            7.3.0                h470a237_2    conda-forge
xorg-xproto               7.0.31               h470a237_7    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge
johanneskoester commented 6 years ago

When directly using rhdf5, I get the same error. When using system hdfview, all is fine. Hence, I think the problem is with the rhdf5 package from Bioconda. Question is whether a it is the specific version of something in the build.

johanneskoester commented 6 years ago

xref: https://github.com/bioconda/bioconda-recipes/issues/12402

johanneskoester commented 6 years ago

Ok, turns out that when I create an HDF5 file with rhdf5 itself, I can afterwards also read it. Maybe an incompatibility between different versions/systems where files have been created?

warrenmcg commented 6 years ago

Hi @johanneskoester,

If you load rhdf5 in R on your machine, what is the output if you do rhdf5::h5version()? I know that the test abundance.h5 file was built with HDF5 Version 1.8.11. You can confirm this yourself by running the following code on the command line:

h5cc -showconfig testthat/small_test_data/kallisto/abundance.h5 | head -6 | tail -1

If there is a mismatch in the versions (especially if the kallisto version is more recent than the rhdf5 version), this is the likely cause of the error.

johanneskoester commented 6 years ago

Thanks for responding! So, the output is:

rhdf5::h5version()
This is Bioconductor rhdf5 2.24.0 linking to C-library HDF5 1.8.19

Whereas abundance.h5 seems indeed to be version 1.8.20 (not 1.8.11 as you thought):

h5cc -showconfig testthat/small_test_data/kallisto/abundance.h5 | head -6 | tail -1
           HDF5 Version: 1.8.20

Very disappointing to see such incompatibilities in such a mature thing as HDF5...

johanneskoester commented 6 years ago

Even an incompatibility in a patch release!? So, maybe I should ensure that kallisto and rhdf5 are pointing to the very same hdf5 version in Bioconda.

warrenmcg commented 6 years ago

Hi @johanneskoester,

I forgot that the test dataset is made when the package is installed, so it will depend on the version of kallisto you have (when I added the test dataset to the repo, I used an older version of kallisto).

The HDF5 team try to maintain a guarantee about backward compatibility (that later versions of the library can read any files made with earlier versions), but it is difficult to maintain forward compatibility (an older version of the library reading files made with newer versions). We have seen this issue before (#175), and since we have now seen this independently by two users, I am wondering if it at the very least we could help users understand the nature of this error. I'm going to open up a feature request for kallisto to address this.

Kallisto is bundled with a particular version of rhdf5, so it would be hard to change that without digging into installation from source code. In the meantime, according to the thread on #175, the creator of rhdf5 said that version 2.24 can be installed and specify any arbitrary hdf5 library, so that might be your best bet. rhdf5 would have to be newer than kallisto.

Finally, for sleuth, we could add error detection code to check if the hdf5 library versions used are incompatible (only in the case where rhdf5 version is older than kallisto version).

johanneskoester commented 6 years ago

Another possibility would be to build kallisto on bioconda with hdf5 <=1.8.19. I'll think about that. I think using the hdf5-agnostic branch of rhdf5 is too early now, as it still seems experimental.

warrenmcg commented 6 years ago

Hi @johanneskoester,

My mistake about prematurely suggesting the hdf5-agnostic branch of rhdf5. I can't speak to the HDF5 library versions used by different versions of kallisto. However, I have done some digging on which HDF5 library versions are being used for different versions of rhdf5. Here's what I have found:

rhdf5 version Bioconductor release R version HDF5 library version HDF5 release date note
<2.24.0 ≤3.6 ≤3.4 1.8.7 May 2011 up to this version, rhdf5 used its own internal library
2.24.0 3.7 3.5 1.8.19 June 2017 this version and above depend on Rhdf5lib for the HDF5 library
≥2.26.0 ≥3.8 ≥3.5 ≥1.10.2 March 2018 all versions beyond this point use 1.10.x versions of HDF5

Version 2.26.0 is the current stable release for rhdf5 on bioconductor.

There were major changes to HDF5 from 1.8.x to 1.10.y, so there are obvious incompatibilities when reading 1.10 files with a 1.8 reader. However, it seems like this HDF5 read error will occur whenever the HDF5 version of the kallisto files is newer than the HDF5 version used by rhdf5. Any decisions about enforcing HDF5 versions on kallisto will depend on whether any features by newer versions are used by kallisto when writing to file. If not, then it would make sense to enforce a low version of HDF5 (1.8.7 seems to have been the version used by rhdf5 as far back as 2011) to maximize compatibility with any reasonable version of rhdf5.

warrenmcg commented 6 years ago

As an update, I did some hunting through the kallisto source code, and cross-referenced possible HDF5-related functions and objects in H5Writer.h/.cpp and h5utils.h/.cpp with the release notes from the HDF group (link).

I did not identify any C++ functions used by kallisto that weren't available in HDF5 version 1.8.7, and none of those functions have undergone major announced changes since.

shirlicohen commented 5 years ago

Hi everyone,

I'm running into the same error when reading the H5 files:

> so <- sleuth_prep(s2c, full_model=d, num_cores=1) reading in kallisto results dropping unused factor levels ..................................................................... normalizing est_counts 10010 targets passed the filter normalizing tpm merging in metadata Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file.

This is my sessionInfo():

> sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rhdf5_2.26.0 bindrcpp_0.2.2 sleuth_0.30.0 edgeR_3.24.1 limma_3.38.3

loaded via a namespace (and not attached): [1] Rcpp_1.0.0 rstudioapi_0.8 bindr_0.1.1 magrittr_1.5 [5] tidyselect_0.2.5 munsell_0.5.0 colorspace_1.3-2 lattice_0.20-38 [9] R6_2.3.0 rlang_0.3.0.1 plyr_1.8.4 dplyr_0.7.8 [13] tools_3.5.1 parallel_3.5.1 grid_3.5.1 data.table_1.11.8 [17] gtable_0.2.0 lazyeval_0.2.1 assertthat_0.2.0 tibble_1.4.2 [21] crayon_1.3.4 BiocManager_1.30.4 tidyr_0.8.2 Rhdf5lib_1.4.2 [25] purrr_0.2.5 ggplot2_3.1.0 glue_1.3.0 compiler_3.5.1 [29] pillar_1.3.0 scales_1.0.0 locfit_1.5-9.1 pkgconfig_2.0.2

This is my version of gcc:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.1) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

This is my version of rhdf5:

> rhdf5::h5version() This is Bioconductor rhdf5 2.26.0 linking to C-library HDF5 1.10.3

I'm using RStudio on OS X.

I checked my H5 files from kallisto and they are all populated and have the same dimensions. Do you have any suggestions?

Thank you.

warrenmcg commented 5 years ago

Hi @shirlicohen,

What is the output of h5cc -showconfig [[abundance.h5 file]] | head -6 | tail -1?

shirlicohen commented 5 years ago

The output is HDF5 Version: 1.10.1

inwon2 commented 5 years ago

Hi,

I am also having the same issue:

so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g, num_cores=1) reading in kallisto results dropping unused factor levels .Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) : HDF5. File accessibilty. Unable to open file.

This is my sessioninfo:

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rhdf5_2.24.0 BiocInstaller_1.30.0 biomaRt_2.36.1 dplyr_0.7.8 sleuth_0.30.0 usethis_1.4.0
[7] devtools_2.0.1

loaded via a namespace (and not attached): [1] progress_1.2.0 tidyselect_0.2.5 remotes_2.0.2 purrr_0.2.5 colorspace_1.3-2 stats4_3.5.1
[7] base64enc_0.1-3 blob_1.1.1 XML_3.98-1.16 rlang_0.3.0.1 pkgbuild_1.0.2 pillar_1.3.0
[13] glue_1.3.0 withr_2.1.2 DBI_1.0.0 BiocGenerics_0.26.0 bit64_0.9-7 sessioninfo_1.1.1
[19] bindrcpp_0.2.2 bindr_0.1.1 plyr_1.8.4 stringr_1.3.1 munsell_0.5.0 gtable_0.2.0
[25] memoise_1.1.0 Biobase_2.40.0 IRanges_2.14.12 callr_3.0.0 ps_1.2.1 curl_3.2
[31] parallel_3.5.1 AnnotationDbi_1.42.1 Rcpp_1.0.0 scales_1.0.0 backports_1.1.2 S4Vectors_0.18.3
[37] desc_1.2.0 pkgload_1.0.2 fs_1.2.6 bit_1.1-14 hms_0.4.2 ggplot2_3.1.0
[43] digest_0.6.18 stringi_1.2.4 processx_3.2.1 grid_3.5.1 rprojroot_1.3-2 cli_1.0.1
[49] tools_3.5.1 bitops_1.0-6 magrittr_1.5 lazyeval_0.2.1 RCurl_1.95-4.11 tibble_1.4.2
[55] RSQLite_2.1.1 crayon_1.3.4 pkgconfig_2.0.2 data.table_1.11.8 prettyunits_1.0.2 httr_1.3.1
[61] assertthat_0.2.0 rstudioapi_0.8 Rhdf5lib_1.2.1 R6_2.3.0 compiler_3.5.1

This is gcc: Apple LLVM version 10.0.0 (clang-1000.10.44.4)

os: High sierra version 10.13.4

This is the version of rhdf5:

rhdf5::h5version() This is Bioconductor rhdf5 2.24.0 linking to C-library HDF5 1.8.19

Any help is appreciated.

johanneskoester commented 5 years ago

@warrenmcg using a kallisto built against an older hdf5 than rhdf5 does produce the same error.

johanneskoester commented 5 years ago

Seems like the issue is rather a bug in rhdf5 2.24 or the bioconda build thereof. It is gone with 2.26. PR bioconda/bioconda-recipes#12597 will update the package so that the error hopefully disappears.

warrenmcg commented 5 years ago

Hi @johanneskoester,

User @shirlicohen is having the same issue with rhdf5 2.26 (HDF 1.10.3) reading kallisto 0.44.0 (HDF5 1.10.1).

Would both of you mind sending me a kallisto abundance.h5 file that is causing this error so I can reproduce the error and do troubleshooting on my end? My contact is on my main GitHub page.

johanneskoester commented 5 years ago

I have further looked into this, and I now believe the reason is that rhdf5lib was build without zlib support. I am working on another fix for the bioconda rhdf5lib package.

warrenmcg commented 5 years ago

@johanneskoester: do you mind elaborating on why this would lead to the bug in question, as this is well outside my comfort zone now? and would it make sense to submit an issue report for rhdf5lib?

johanneskoester commented 5 years ago

@warrenmcg here is my theory: kallisto stores e.g. the bootstrap values with gz filter applied (already checked). If the reading rhdflib does not find its zlib, it fails with the reported error. I am still working on getting the zlib dependency to work properly though.