saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
190 stars 24 forks source link

run_ulm taking very long to run and no feedback on progress #127

Closed gilstel closed 3 months ago

gilstel commented 4 months ago

I used run_ulm on the data (tried RNA as well as SCT assay data, separately) from a Seurat object and it took several hours to run on Rstudio (we have a very strong Rstudio-dedicated server with 52 cores and 512 Gb memory). I left it to run overnight and when I checked it the next morning the prompt didn't come back so I aborted the command using escape. It would be very helpful to have some sort of progress bar which would indicate how the function if progressing. It would be also very helpful to be able to set the amount of cores the function uses like there is in run_dorothea or maybe you could use the future package (that is used in the Seurat package to run various functions faster) for speeding it up.

Many thanks

PauBadiaM commented 4 months ago

Hi @gilstel,

Which version of decoupleR are you running? It could be the case you are running a very old one. Try installing the latest version of decoupleR:

install.packages('remotes')
remotes::install_github('saezlab/decoupleR')

Alternatively you can switch to the python version of decoupler since it is more scalable than the R one. Let me know how it goes!

gilstel commented 4 months ago

Hi @PauBadiaM

We are using decoupleR version 2.6.0 (I tried to look in the github page to see if this was the latest but couldn't find it easily). I like having decoupleR in R so that I can use the data from my Seurat object.
I assume that if I will use the python version I would first need to export the (SCT) assay data from R to a file and then import it somehow into the python environment in order to use it there.

Irrespectively, since I have mouse data I used the the following command -

> net = get_collectri(organism="mouse", split_complexes=FALSE)
[2024-05-22 14:19:34] [SUCCESS] [OmnipathR] Downloaded 64495 interactions.
> net
# A tibble: 42,595 × 3
   source target   mor
   <chr>  <chr>  <dbl>
 1 MYC    TERT       1
 2 SPI1   BGLAP      1
 3 SMAD3  JUN        1
 4 SMAD4  JUN        1
 5 STAT5A IL2        1
 6 STAT5B IL2        1
 7 RELA   FAS        1
 8 WT1    NR0B1      1
 9 NR0B2  CASP1      1
10 SP1    ALDOA      1

Later on, when I used run_ulm (which took several minutes) it only found one source

>mat.sct.assay = as.matrix(named.clust.obj.minus.clust.32.33@assays$SCT@data)
>dim(mat.sct.assay)
 > # Run ulm
> acts.sct.assay.minsize.5 = run_ulm(mat=mat.sct.assay, network = net, .source='source', .target='target', .mor='mor', minsize = 5)
> unique(acts.sct.assay.minsize.5$source)
[1] "HNF4A"
> acts.sct.assay.minsize.5
# A tibble: 30,459 × 5
   statistic source condition                      score p_value
   <chr>     <chr>  <chr>                          <dbl>   <dbl>
 1 ulm       HNF4A  channel.1_AAACCCAAGAGGGTCT-1  1.08    0.280 
 2 ulm       HNF4A  channel.1_AAACCCAGTCTTCATT-1  0.890   0.374 
 3 ulm       HNF4A  channel.1_AAACCCATCTGGGTCG-1  0.0927  0.926 
 4 ulm       HNF4A  channel.1_AAACGAAAGAAACTCA-1  0.575   0.566 
 5 ulm       HNF4A  channel.1_AAACGAACACGAGGAT-1  0.0393  0.969 
 6 ulm       HNF4A  channel.1_AAACGAAGTTCAGCTA-1  0.624   0.532 
 7 ulm       HNF4A  channel.1_AAACGAATCACTGGGC-1  0.927   0.354 
 8 ulm       HNF4A  channel.1_AAACGCTAGATGTTAG-1  1.84    0.0653
 9 ulm       HNF4A  channel.1_AAACGCTCAAGGTACG-1  1.08    0.280 
10 ulm       HNF4A  channel.1_AAACGCTTCGTCAGAT-1 -0.342   0.733 

However, when I first translated gene symbols in net from UPPERCASE to Sentence Case

library(snakecase)
net$source = to_sentence_case(string = net$source, sep_out = "")
net$target = to_sentence_case(string = net$target, sep_out = "")

followed by run_ulm (which took several hours to run)

>   length(unique(curr.condition.acts.sct.assay.minsize.5$source))
[1] 690
>   head(unique(curr.condition.acts.sct.assay.minsize.5$source))
[1] "Myc"    "Spi1"   "Smad3"  "Smad4"  "Stat5a" "Stat5b"

Maybe the network file should be updated with gene symbols in mouse format?

PauBadiaM commented 4 months ago

Hi @gilstel,

Did you install from github? The current R version of decoupleR is 2.9.7, and the python version is 2.6.0. Double check that you installed the latest R version, you can do this by running packageVersion("snow"). It looks like the mouse conversion did not work which is an old bug in decoupleR, you should update it and try again (remember to restart your R session). Let me know how it goes.