ratan-lab / sumo

Subtyping tool for multi-omic data
https://pypi.org/project/python-sumo
MIT License
13 stars 1 forks source link

Problems reading in the output npz file #6

Closed aakrosh closed 4 years ago

aakrosh commented 4 years ago

When I read in the output file npz file (sumo_results.npz) using reticulate, I am unable to print the contents of the file named "clusters". I am able to print the contents of all the other files including "quality", "consensus", cophenet", "unfiltered_consensus" and "summary". To recreate this issue, you should be able to do the following

library(survival)
library(survminer)
library(reticulate)
np <- import("numpy")
data <- np$load("sumo_results.npz")
data$files
data$f["clusters"]

The above should fail with the error

Error: Python object has no '__getitem__' method

I am using python 3.7.0, and here is the output of my sessionInfo() in R

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reticulate_1.13   survminer_0.4.6   ggpubr_0.2.3      magrittr_1.5     
[5] ggplot2_3.2.1     survival_2.44-1.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2        pillar_1.4.2      compiler_3.6.1    tools_3.6.1      
 [5] zeallot_0.1.0     jsonlite_1.6      tibble_2.1.3      lifecycle_0.1.0  
 [9] gtable_0.3.0      nlme_3.1-141      lattice_0.20-38   pkgconfig_2.0.3  
[13] rlang_0.4.0       Matrix_1.2-17     xfun_0.10         gridExtra_2.3    
[17] withr_2.1.2       dplyr_0.8.3       knitr_1.25        generics_0.0.2   
[21] vctrs_0.2.0       survMisc_0.5.5    grid_3.6.1        tidyselect_0.2.5 
[25] data.table_1.12.4 glue_1.3.1        R6_2.4.0          KMsurv_0.1-5     
[29] km.ci_0.5-2       purrr_0.3.2       tidyr_1.0.0       scales_1.0.0     
[33] backports_1.1.5   splines_3.6.1     assertthat_0.2.1  xtable_1.8-4     
[37] colorspace_1.4-1  ggsignif_0.6.0    lazyeval_0.2.2    munsell_0.5.0    
[41] broom_0.5.2       crayon_1.3.4      zoo_1.8-6        
sienkie commented 4 years ago

I have encountered this issue before. "reticulate" has some problems with files pickled with python3. To deal with this you should specify python3 before attaching the "reticulate" package and allow for pickled files when loading the npz file.

reticulate::use_python(Sys.which('python3'), required = TRUE)
library(reticulate)
np <- import("numpy")
data <- np$load("sumo_results.npz", allow_pickle = T)
data$files
data$f["clusters"]
aakrosh commented 4 years ago

Thanks. That works perfectly.