thelovelab / tximport

Transcript quantification import for modular pipelines
134 stars 33 forks source link

cann't import kallisto abundance.h5/.tsv #41

Closed Ci-TJ closed 4 years ago

Ci-TJ commented 4 years ago

Hi! I just want to import the kallisto abundance.h5/abundance.tsv by tximport, but I failed.

 ~/RNA-seq/Analysis/GSE112055/quant_out/SRR6868519]$ cat run_info.json
{
        "n_targets": 141398,
        "n_bootstraps": 100,
        "n_processed": 59161050,
        "n_pseudoaligned": 50375628,
        "n_unique": 18126755,
        "p_pseudoaligned": 85.1,
        "p_unique": 30.6,
        "kallisto_version": "0.46.2",
        "index_version": 10,
        "start_time": "Mon May  4 17:44:24 2020",
        "call": "kallisto quant -i /home2/ymwang/linqin/RNA-seq/fasta/release99/transcripts_Mus.idx -o quant_out/SRR6868519 -t 24 -b 100 download_GSE112055/SRR6868519.sra_1.fastq download_GSE112055/SRR6868519.sra_2.fastq"
}

~/RNA-seq/Analysis/GSE112055/quant_out/SRR6868519]$ head abundance.tsv
target_id       length  eff_length      est_counts      tpm
ENSMUST00000177564.1    16      8.71429 0       0
ENSMUST00000196221.1    9       8.66667 0       0
ENSMUST00000179664.1    11      8.25    0       0
ENSMUST00000178537.1    12      9.25    0       0
ENSMUST00000178862.1    14      8       0       0
ENSMUST00000179520.1    11      8.25    0       0
ENSMUST00000179883.1    16      8.71429 0       0
ENSMUST00000195858.1    10      9.66667 0       0
ENSMUST00000179932.1    12      9.25    0       0
> library(tximport)
> library(rhdf5)
> sample_names = c("SRR6868519",paste0("SRR686852", 0:8))
> quant_files = file.path("/home2/ymwang/linqin/RNA-seq/Analysis/GSE112055/quant_out", sample_names, "abundance.h5")
> file.exists(quant_files)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> txi = tximport(files = quant_files, type = "kallisto", txOut = TRUE)
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate

> quant_files = file.path("/home2/ymwang/linqin/RNA-seq/Analysis/GSE112055/quant_out", sample_names, "abundance.tsv")
> file.exists(quant_files)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> txi = tximport(files = quant_files, type = "kallisto", txOut = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate
In addition: Warning messages:
1: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
2: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
>

Then, I reproduced the example(https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html). Interestingly, it had different results for two kallisto files in package tximportData.

#kallisto_boot dir
> files <- file.path(dir, "kallisto_boot", samples$run, "abundance.h5")
> names(files) <- paste0("sample", 1:6)
> all(file.exists(files))
[1] TRUE
> txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate
In addition: Warning messages:
1: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
2: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.

> files <- file.path(dir, "kallisto_boot", samples$run, "abundance.tsv.gz")
> names(files) <- paste0("sample", 1:6)
> all(file.exists(files))
[1] TRUE
> txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
Error: Unable to read dataset.
Not all required filters available.
Missing filters: deflate
In addition: Warning messages:
1: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
2: In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.

#kallisto dir
> files <- file.path(dir, "kallisto", samples$run, "abundance.tsv.gz")
> file.exists(files)
[1] TRUE TRUE TRUE TRUE TRUE TRUE
> txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6
summarizing abundance
summarizing counts
summarizing length
Warning message:
In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
> txi.kallisto.tsv <- tximport(files, type = "kallisto", txOut=T)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6
Warning message:
In h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
  An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.

What's the problem with it? does tximport need special outputs from kallisto with special arguments?

mikelove commented 4 years ago

sessionInfo()?

Ci-TJ commented 4 years ago

sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-conda_cos6-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /home2/ymwang/miniconda3/envs/rna/lib/libopenblasp-r0.3.7.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] rhdf5_2.30.1 tximport_1.14.2

loaded via a namespace (and not attached): [1] SummarizedExperiment_1.16.1 progress_1.2.2 [3] tidyselect_1.0.0 purrr_0.3.4 [5] lattice_0.20-41 vctrs_0.2.4 [7] stats4_3.6.2 BiocFileCache_1.10.2 [9] rtracklayer_1.46.0 GenomicFeatures_1.38.2 [11] blob_1.2.1 XML_3.99-0.3 [13] rlang_0.4.6 pillar_1.4.4 [15] glue_1.4.0 DBI_1.1.0 [17] BiocParallel_1.20.1 rappdirs_0.3.1 [19] BiocGenerics_0.32.0 bit64_0.9-7 [21] dbplyr_1.4.3 matrixStats_0.56.0 [23] GenomeInfoDbData_1.2.2 lifecycle_0.2.0 [25] stringr_1.4.0 zlibbioc_1.32.0 [27] Biostrings_2.54.0 memoise_1.1.0 [29] Biobase_2.46.0 IRanges_2.20.2 [31] biomaRt_2.42.1 GenomeInfoDb_1.22.1 [33] parallel_3.6.2 curl_4.3 [35] AnnotationDbi_1.48.0 Rcpp_1.0.4.6 [37] readr_1.3.1 openssl_1.4.1 [39] DelayedArray_0.12.3 S4Vectors_0.24.4 [41] XVector_0.26.0 bit_1.1-15.2 [43] Rsamtools_2.2.3 hms_0.5.3 [45] askpass_1.1 digest_0.6.25 [47] stringi_1.4.6 dplyr_0.8.5 [49] GenomicRanges_1.38.0 grid_3.6.2 [51] tools_3.6.2 bitops_1.0-6 [53] magrittr_1.5 RCurl_1.98-1.2 [55] RSQLite_2.2.0 tibble_3.0.1 [57] crayon_1.3.4 pkgconfig_2.0.3 [59] ellipsis_0.3.0 Matrix_1.2-18 [61] prettyunits_1.1.1 assertthat_0.2.1 [63] httr_1.4.1 Rhdf5lib_1.8.0 [65] R6_2.4.1 compiler_3.6.2 [67] GenomicAlignments_1.22.1

mikelove commented 4 years ago

It's possible that you problem is fixed with the current release of rhdf5 (v2.32).

Also, 10 days ago there was a support site post that may be related. My advice there will also help you: cut out tximport from the equation by just trying to read in the h5 files into R. This will help you debug.

https://support.bioconductor.org/p/130419/#130420

Ci-TJ commented 4 years ago

Thanks! I will try later, but now my IP has been limited by bioconductor.