pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
305 stars 95 forks source link

sleuth_prep fails to summarize bootstraps #243

Open dimitras opened 4 years ago

dimitras commented 4 years ago

Hi, I am trying to run sleuth with kallisto abundances, but fails at the preparation stage.

This is how I run kallisto:

kallisto quant -i annotation/mm10.idx -o S1 -t 8 --pseudobam -b 100 --single -l 250 -s 80 S1.fastq.gz

When I read the annotation dictionary, I do use stringsAsFactors = F as recommended in one of the fixes.

tx2gene_dict = read.table("transcripts_to_genes.txt", sep = "\t", header = F, stringsAsFactors = F)

> class(tx2gene_dict)
[1] "data.frame"

>head(tx2gene_dict)
             target_id             ens_gene      ext_gene
1 ENSMUST00000193812.1 ENSMUSG00000102693.1 4933401J01Rik
2 ENSMUST00000082908.1 ENSMUSG00000064842.1       Gm26206
3 ENSMUST00000162897.1 ENSMUSG00000051951.5          Xkr4
4 ENSMUST00000159265.1 ENSMUSG00000051951.5          Xkr4
5 ENSMUST00000070533.4 ENSMUSG00000051951.5          Xkr4
6 ENSMUST00000192857.1 ENSMUSG00000102851.1       Gm18956

This is my sample to covariates dataframe:

> class(sample2cov)
[1] "data.frame"

> head(sample2cov)
     sample   age treatment timepoint path
1 S1 young       baseline       1mo S1
2 S2 young       baseline       1mo S2
3 S3 young       baseline       1mo S3
4 S4 young       baseline       1mo S4
5 S5 young       baseline       3mo S5
6 S6 young       baseline       3mo S6

And when I try to create the sleuth object, I get the following error:

so <- sleuth_prep(sample2cov, target_mapping = tx2gene_dict, full_model = ~ treatment*timepoint, aggregation_column = 'ens_gene', extra_bootstrap_summary = T)

reading in kallisto results
dropping unused factor levels
...............
normalizing est_counts
59550 targets passed the filter
normalizing tpm
merging in metadata
summarizing bootstraps
.Error in bs_mat[, which_ids] : subscript out of bounds

I did check whether my .h5 files are corrupted with hdf5-tester and they passed the check.

I also tried to run it outside RStudio, just in case there was a conflict. But still the same error, a bit more expanded:

reading in kallisto results
dropping unused factor levels
...............
normalizing est_counts
59550 targets passed the filter
normalizing tpm
merging in metadata
summarizing bootstraps
.......Sample 'S1' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S2' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S3' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S4' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S5' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S6' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S7' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S8' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S9' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S10' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S11' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S12' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S13' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S14' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds
Sample 'S15' had this error message: Error in bs_mat[, which_ids] : subscript out of bounds

Error in sleuth_prep(sample2cov, target_mapping = tx2gene_dict,  :
  At least one core from mclapply had an error. See the above error message(s) for more details.

I have also tried to create the sleuth object by eliminating each option, but I get the exact same error.

so <- sleuth_prep(sample2cov, target_mapping = tx2gene_dict, full_model = ~ treatment*timepoint)

so <- sleuth_prep(sample2cov, target_mapping = tx2gene_dict)

so <- sleuth_prep(sample2cov)

Plus , I tried running it on a cluster in case memory was the issue, but I keep getting the error.

Any ideas would be highly appreciated, thank you!

My session info is:

> sessionInfo() 
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sleuth_0.30.0 tibble_2.1.3  tidyr_1.0.2   dplyr_0.8.4  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.3          pillar_1.4.3        compiler_3.6.1      BiocManager_1.30.10 remotes_2.1.1       prettyunits_1.1.1   tools_3.6.1        
 [8] testthat_2.3.1      pkgload_1.0.2       digest_0.6.25       pkgbuild_1.0.6      memoise_1.1.0       lifecycle_0.1.0     gtable_0.3.0       
[15] rhdf5_2.30.1        pkgconfig_2.0.3     rlang_0.4.4         cli_2.0.2           rstudioapi_0.11     parallel_3.6.1      withr_2.1.2        
[22] fs_1.3.1            desc_1.2.0          vctrs_0.2.3         devtools_2.2.2      rprojroot_1.3-2     grid_3.6.1          tidyselect_1.0.0   
[29] glue_1.3.1          data.table_1.12.8   R6_2.4.1            processx_3.4.2      fansi_0.4.1         sessioninfo_1.1.1   ggplot2_3.2.1      
[36] purrr_0.3.3         Rhdf5lib_1.8.0      callr_3.4.2         magrittr_1.5        usethis_1.5.1       backports_1.1.5     ps_1.3.2           
[43] scales_1.1.0        ellipsis_0.3.0      assertthat_0.2.1    colorspace_1.4-1    lazyeval_0.2.2      munsell_0.5.0       crayon_1.3.4 
dimitras commented 4 years ago

Hi all! Any update on this one? Thank you!