saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
176 stars 23 forks source link

Network is empty after intersecting it with mat #98

Closed JinSooJoo closed 9 months ago

JinSooJoo commented 9 months ago

Hello, firstly I appreciate so much for the excellent package.

I'm having some problem with running statistics tools - such as run_aucell, run_ulm or run_wmean. Some of the public GEO data would work, but some are not. In this case, the error below is shown.

Error in filt_minsize(rownames(counts), net, minsize = 5) : 
  Network is empty after intersecting it with mat and
filtering it by sources with at least 5 targets. Make sure mat and 
network have shared target features or reduce the number assigned to minsize

If this were to be unmatching of counts and net, what could be the possible solution? Thank you for your time and help!

deeenes commented 9 months ago

Hi @JinSooJoo,

This only means that the feature labels in counts don't match the identifier (target) column in net. For example, if your counts data uses mouse gene symbols and your net contains human gene symbols you get this error message. If you check yourself the row names of counts and the contents of net, very likely you will realize the issue.

JinSooJoo commented 9 months ago

Hi @deeenes, I appreciate your help. Both of my countsand netare from human resources. counts are from GSE186352 and I used net for net <- get_collectri(organism='human', split_complexes=FALSE).

deeenes commented 9 months ago

Are you saying that length(intersect(rownames(counts), net$target)) > 5L?

JinSooJoo commented 9 months ago

Rownames of countsand contents of netdo match because they are both from human data. In regarding your comments, I got the following code:

> length(intersect(rownames(counts), net$target)) > 5L
[1] FALSE
deeenes commented 9 months ago

Rownames of countsand contents of netdo match because they are both from human data.

There might be many other reasons why these don't match, e.g. different ID types

In regarding your comments, I got the following code:

> length(intersect(rownames(counts), net$target)) > 5L
[1] FALSE

Well, if there are not even 5 common elements, very likely these two vectors contain completely different kind of stuff. Best to check it manually:

head(sort(unique(rownames(counts))))
head(sort(unique(net$target)))
JinSooJoo commented 9 months ago

Two vectors are actually containing common elements, human genes, but my counts are shown as numeric features as:

> head(sort(unique((rownames(counts)))))
[1] "1"     "10"    "100"   "1000"  "10000" "10001"
> head(sort(unique(net$target)))
[1] "A2M"    "A2ML1"  "A4GALT" "AACS"   "AANAT"  "AAR2"  

Is there any possible way to change numeric features into gene name? I also attach another code:

> head(sort(unique((counts$gene))))
[1] "A1BG"     "A1BG.AS1" "A2M"      "A2M.AS1"  "A4GALT"   "AAAS"    
> head(sort(unique(net$target)))
[1] "A2M"    "A2ML1"  "A4GALT" "AACS"   "AANAT"  "AAR2" 
deeenes commented 9 months ago

Indeed, for decoupleR to work counts should be a numeric matrix with row names matching the net$target. I suggest you to set the row names on your data frame, and convert it to a numeric matrix:

rownames(counts) <- counts$gene
counts <-
    as.matrix(counts[,Filter(function(x){is.numeric(counts[[x]])}, colnames(counts))])

# you can check if the result is indeed a numeric matrix:
is.numeric(counts)
# [1] TRUE
is.matrix(counts)
# [1] TRUE

# and if the row names are correct:
length(intersect(rownames(counts), net$target))
# [1] 14223

If all looks fine, try the decoupleR call with this counts matrix.

JinSooJoo commented 9 months ago

Dear deeenes,

Thank you for your enormous help - the issue is resolved now!

Regards, Jin Soo Joo