saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
183 stars 24 forks source link

colinearity #71

Closed imerelli closed 1 year ago

imerelli commented 1 year ago

Hi, I'm trying decoupleR with dorothea on a proteomic experiment. I'm loading a table with gene/protein names, logFC and pvalues, but I'm facing this problem, maybe it is a stupid thing, but I don't know how to proceed.

dorothea <- get_dorothea(organism='human', levels=c('A', 'B', 'C')) [2023-02-20 09:18:29] [SUCCESS] [OmnipathR] Loaded 278482 interactions from cache. mat<-read.delim("first_list.txt",row.names = 1) head(mat) logFC PValue ACTA1 37.80482 4.17544e-05 ACTB 276.01063 8.38850e-09 ACTBL2 436.90369 1.09067e-09 ACTN2 34.11805 6.29361e-05 ACTN4 45.91130 1.89706e-05 ARHGDIA 41.13212 2.96988e-05 res_decouple <- decouple(mat, + dorothea, + .source='source', + .target='target', + minsize = 0) Error in map2(): ℹ In index: 1. Caused by error in mutate(): ℹ In argument: model = list(mlm_evaluate_model(.data$condition)). ℹ In row 1. Caused by error in .mlm_evaluate_model(): ! After intersecting mat and network, at least 56 sources in the network are colinear with other sources. Cannot fit a linear model with colinear covariables, please remove them. Please run decoupleR::check_corr to see what regulators are correlated. Run rlang::last_error() to see where the error occurred.

PauBadiaM commented 1 year ago

Hi @imerelli,

There are some things you could do:

  1. Set the parameter minsize to something bigger than 0, for example 5. This argument what it does is to remove TFs that have less than minsize target genes in your input matrix. Activities inferred with less than 5 targets tend to be rather noisy.
  2. If the error persist, as mentioned in the error, run decoupleR::check_corr to identify which TFs are highly correlated after filtering by your data. If two TFs have high correlation (>.95), you cannot fit a multivariate linear model due to colinearity in the coefficients. Once you identify which TFs are colinear, pick one and remove the other for each pair.
  3. Additionally, you should remove the pvalue column from mat.

Hope this is helpful!

imerelli commented 1 year ago

Hi, thank you for your help. Removing the pvalues and increasing minsize to 5 it worked. But in the following test, I tried to split the column of the general logFC for each protein in case1/ctrl and case2/ctrl, and I got the error below, any suggestion?

head(mat) GNE.CASE1.vs.CTR GNE.CASE2.vs.CTR ACTA1 -1.2720848 -0.62230089 ACTB -1.2162731 -0.14167834 ACTBL2 -1.4490482 -0.63023057 ACTN2 -1.4759825 -0.01010101 ACTN4 -1.6024540 0.10946555 ARHGDIA -0.5806452 0.91156463 res_decouple <- decouple(mat, + dorothea, + .source='source', + .target='target', + minsize = 5) Error in map2(): ℹ In index: 3. Caused by error in map(): ℹ In index: 1. Caused by error in weight_mat %*% mat: ! requires numeric/complex matrix/vector arguments Run rlang::last_error() to see where the error occurred.

PauBadiaM commented 1 year ago

Hi @imerelli

Could you check if your mat is a matrix or a dataframe? It needs to be a matrix. Also, check that every element in your matrix is numeric, maybe you have a string somewhere. Just in case you could also install the latest version of decoupler from github:

remotes::install_github("saezlab/decoupleR")
imerelli commented 1 year ago

Thank you for your suggestions.