saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
190 stars 24 forks source link

Error when running run_wmean for transcription factor activity inference #85

Closed TrmMelanoma closed 1 year ago

TrmMelanoma commented 1 year ago

Dear decoupleR development team,

Thank you for your efforts developing such a useful package. I was following the tutorial Transcription factor activity inference from scRNA-seq](https://saezlab.github.io/decoupleR/articles/tf_sc.html) and encountered an error when running run_wmean which seems like some internal issues with some packages.

net <- get_collectri(organism='human', split_complexes=FALSE) [2023-05-24 00:18:05] [SUCCESS] [OmnipathR] Loaded 68010 interactions from cache. Warning messages: 1: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "source_genesymbol" instead of .data$source_genesymbol This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. 2: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "target_genesymbol" instead of .data$target_genesymbol This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. 3: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "weight" instead of .data$weight This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.

net

A tibble: 45,223 × 3

source target mor

1 MYC TERT 1 2 SPI1 BGLAP 1 3 SMAD3 ADAM2 1 4 SMAD3 JUN 1 5 SMAD4 ADAM2 1 6 SMAD4 JUN 1 7 STAT5A IL2 1 8 STAT5B IL2 1 9 RELA FAS 1 10 WT1 NR0B1 1 ℹ 45,213 more rows ℹ Use `print(n = ...)` to see more rows

mat <- as.matrix(PW_Adult_only.integrated@assays$RNA@data) acts <- run_wmean(mat=mat, net=net, .source='source', .target='target', .mor='mor', times = 100, minsize = 5)

Error in vec_rbind(): ! Negative n in compact_rep(). ℹ In file utils.c at line 897. ℹ Install the winch package to get additional debugging info the next time you get this error. ℹ This is an internal error that was detected in the vctrs package. Please report it at https://github.com/r-lib/vctrs/issues with a reprex (https://tidyverse.org/help/) and the full backtrace. Backtrace: ▆

  1. ├─decoupleR::run_wmean(...)
  2. │ ├─withr::with_seed(...)
  3. │ │ └─withr::with_preserve_seed(...)
  4. │ └─decoupleR:::.wmean_analysis(...)
  5. │ ├─... %>% ...
  6. │ └─purrr::map_dfr(seq_len(times), ~wmean_run(random = TRUE))
  7. │ └─dplyr::bind_rows(res, .id = .id)
  8. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
  9. ├─dplyr::select(...)
  10. ├─dplyr::arrange(., .data$statistic, .data$source, .data$condition)
  11. ├─tidyr::pivot_longer(...)
  12. ├─dplyr::rename(...)
  13. ├─dplyr::select(., -contains("null"))
  14. ├─dplyr::mutate(...)
  15. ├─dplyr::left_join(...)
  16. ├─dplyr::summarise(...)
  17. ├─dplyr::group_by(., .data$source, .data$condition)
  18. └─rlang:::stop_internal_c_lib(...)
  19. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

Look forward to your reply! Thank you so much!!

danakrs commented 1 year ago

Dear decoupleR development team,

Thank you for your efforts developing such a useful package. I was following the tutorial Transcription factor activity inference from scRNA-seq](https://saezlab.github.io/decoupleR/articles/tf_sc.html) and encountered an error when running run_wmean which seems like some internal issues with some packages.

net <- get_collectri(organism='human', split_complexes=FALSE) [2023-05-24 00:18:05] [SUCCESS] [OmnipathR] Loaded 68010 interactions from cache. Warning messages: 1: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "source_genesymbol" instead of .data$source_genesymbol This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. 2: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "target_genesymbol" instead of .data$target_genesymbol This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated. 3: Use of .data in tidyselect expressions was deprecated in tidyselect 1.2.0. ℹ Please use "weight" instead of .data$weight This warning is displayed once every 8 hours. Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.

net

A tibble: 45,223 × 3

source target mor

1 MYC TERT 1 2 SPI1 BGLAP 1 3 SMAD3 ADAM2 1 4 SMAD3 JUN 1 5 SMAD4 ADAM2 1 6 SMAD4 JUN 1 7 STAT5A IL2 1 8 STAT5B IL2 1 9 RELA FAS 1 10 WT1 NR0B1 1 ℹ 45,213 more rows ℹ Use print(n = ...) to see more rows

mat <- as.matrix(PW_Adult_only.integrated@assays$RNA@data) acts <- run_wmean(mat=mat, net=net, .source='source', .target='target', .mor='mor', times = 100, minsize = 5)

Error in vec_rbind(): ! Negative n in compact_rep(). ℹ In file utils.c at line 897. ℹ Install the winch package to get additional debugging info the next time you get this error. ℹ This is an internal error that was detected in the vctrs package. Please report it at https://github.com/r-lib/vctrs/issues with a reprex (https://tidyverse.org/help/) and the full backtrace. Backtrace: ▆

  1. ├─decoupleR::run_wmean(...)
  2. │ ├─withr::with_seed(...)
  3. │ │ └─withr::with_preserve_seed(...)
  4. │ └─decoupleR:::.wmean_analysis(...)
  5. │ ├─... %>% ...
  6. │ └─purrr::map_dfr(seq_len(times), ~wmean_run(random = TRUE))
  7. │ └─dplyr::bind_rows(res, .id = .id)
  8. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id, .error_call = current_env())
  9. ├─dplyr::select(...)
  10. ├─dplyr::arrange(., .data$statistic, .data$source, .data$condition)
  11. ├─tidyr::pivot_longer(...)
  12. ├─dplyr::rename(...)
  13. ├─dplyr::select(., -contains("null"))
  14. ├─dplyr::mutate(...)
  15. ├─dplyr::left_join(...)
  16. ├─dplyr::summarise(...)
  17. ├─dplyr::group_by(., .data$source, .data$condition)
  18. └─rlang:::stop_internal_c_lib(...)
  19. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

Look forward to your reply! Thank you so much!!

I'll second that, getting the same exact error!

PauBadiaM commented 1 year ago

Hi @TrmMelanoma and @danakrs , Could you try to install the latest versions of decoupleR and OmnipathR from GitHub? Here for decoupleR:

remotes::install_github('saezlab/decoupleR')

Here for OmnipathR:

remotes::install_github('saezlab/OmnipathR')

BTW, I would use instead run_ulm since it is the analytical solution to norm_wmean with infinite permutations. Hope this is helpful!

MarcElosua commented 1 year ago

Hi @PauBadiaM

Very very nice work with decoupleR! One quick follow-up question from your previous comment. What do you mean with run_ulm being the analytical solution to norm_wmean? What is the intuition behind that?

Also, when running wmean is there any scenario in which you would use wmean over norm_wmean? If not, should one always go straight to ulm?

Sorry for so many questions... Gracies I Salut!!!

PauBadiaM commented 1 year ago

Hi @MarcElosua

Many thanks ;) Regarding your first question, we found that when we increase the number of permutations of wmean, the obtained norm_wmean scores approximate the results of ulm. Here is some code if interested:

library(decoupleR)
library(tidyverse)
library(ggplot2)

# Read data
inputs_dir <- system.file("extdata", package = "decoupleR")
data <- readRDS(file.path(inputs_dir, "sc_data.rds"))
net <- get_collectri(organism='human', split_complexes=FALSE)
mat <- as.matrix(data@assays$RNA@data)

# Run ulm
ulm <- run_ulm(
  mat=mat,
  net=net,
  .source='source',
  .target='target',
  .mor='weight',
  minsize = 5) %>%
  rename(ulm=score) %>%
  select(source, condition, ulm)

# Run wmean
i <- 1000 # Modify to see the effect
norm_wmean <- decoupleR::run_wmean(
  mat,
  net,
  .source='source',
  .target='target',
  .mor='weight',
  minsize = 5,
  times=i) %>%
  filter(statistic == 'norm_wmean') %>%
  mutate(times=i) %>%
  rename(norm_wmean=score) %>%
  select(source, condition, norm_wmean, times)

# Merge
df <- ulm %>%
  left_join(norm_wmean)

# Plot
df %>%
  ggplot(aes(x=ulm, y=norm_wmean)) + 
  geom_point()

# Corr
cor(df$ulm, df$norm_wmean)
# 0.9992998

image

If you increase the number of permutations, the correlation increases, and the opposite if you decrease it. For now we only have empirical evidence that this is the case, but I hadn't had the time to sit down and try to prove it mathematically.

Regarding your second question, generally no since wmean is just the raw estimate, no statistical test was applied. The nice thing about norm_wmean and ulm is that in a single statistic it provides the direction of activity (either + or -) and its significance (the magnitude of the score). Therefore yes, I would recommend to go straight to ulm since it's way faster than norm_wmean.

Let me know if you have any more questions. Salut!

MarcElosua commented 1 year ago

Thank you so much!!