saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
185 stars 24 forks source link

decoupler script crashes because it uses all CPUs #136

Open ahmedasadik opened 3 weeks ago

ahmedasadik commented 3 weeks ago

Hi, I have been facing this weird issue. Evrytime I run the run_ulm function, suddenly all my cores are used and the machine crashes.

This is the code I was trying to run from the CollecTRI package:

library(tidyverse)
library(decoupleR)
library(magrittr)

## Load files
dorothea_ABC <- read.csv("data/networks/dorothea_ABC.csv")
CollecTRI <- read.csv("output/CollecTRI/CollecTRI_GRN.csv") %>% rename(mor = weight)

# In this use case we use data from CPTAC and three cancer types:
# UCEC: Uterine Corpus Endometrial Carcinoma
# LUAD: Lung Adenocarcinoma
# CCRCC: Clear Cell Renal Cell Carcinoma
download.file("https://zenodo.org/record/7773985/files/ucec_counts_tvalues.csv?download=1", file.path("data", "CPTAC_DEGs", "ucec_counts_tvalues.csv"))
download.file("https://zenodo.org/record/7773985/files/luad_counts_tvalues.csv?download=1", file.path("data", "CPTAC_DEGs", "luad_counts_tvalues.csv"))
download.file("https://zenodo.org/record/7773985/files/ccrcc_counts_tvalues.csv?download=1", file.path("data", "CPTAC_DEGs", "ccrcc_counts_tvalues.csv"))

# Read data in a list of dataframes
file_names = list.files(path ="data/CPTAC_DEGs", pattern="*.csv", full.names = T)
file_list = lapply(file_names, read.csv)
file_list = setNames(file_list, gsub("data/CPTAC_DEGs/|.csv|_counts_tvalues","",file_names))

# Format input for decoupleR
decoupler_inputs <- lapply(file_list, function(x) as.data.frame(x) %>%
                             set_rownames(.$ID) %>% dplyr::select(NATvsTUM_t) %>% filter(!is.na(NATvsTUM_t)))

res_decoupler_CollecTRI <- lapply(decoupler_inputs, function(x) run_ulm(as.matrix(x),network = CollecTRI,.source='source',
                                                                         .target='target', minsize = 5))

After this point I get an R session error and the session crashes.

This is the output of the session info:

R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_2.0.3  decoupleR_2.8.0 lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4    
 [7] purrr_1.0.2     readr_2.1.5     tidyr_1.3.1     tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       cli_3.6.2         rlang_1.1.3       stringi_1.8.3     generics_0.1.3   
 [6] glue_1.7.0        colorspace_2.1-0  hms_1.1.3         scales_1.3.0      fansi_1.0.6      
[11] grid_4.3.3        munsell_0.5.1     tzdb_0.4.0        lifecycle_1.0.4   compiler_4.3.3   
[16] timechange_0.3.0  pkgconfig_2.0.3   rstudioapi_0.16.0 lattice_0.22-6    R6_2.5.1         
[21] tidyselect_1.2.1  utf8_1.2.4        pillar_1.9.0      Matrix_1.6-5      tools_4.3.3      
[26] withr_3.0.0       gtable_0.3.5 

Your help is much appreciated.

PauBadiaM commented 3 weeks ago

Hi @ahmedasadik,

It looks like it might be a memory issue, can you check your RAM usage while running the script? If memory is an issue, you could try to run the same code on the python version of decoupler, which is much more memory efficient.