saezlab / dorothea

R package to access DoRothEA's regulons
https://saezlab.github.io/dorothea/
GNU General Public License v3.0
132 stars 26 forks source link

Please, check the CTFRs data to make it compatible with the library #1

Closed jperales closed 4 years ago

jperales commented 7 years ago

Hi,

The current version of the Consensus TF regulons data (CTFRs_v122016.rdata) is not compatible with the package of functions from lib_enrichment_scores.r. CTFRs_v122016.rdata is a list of two elements: geneset$NAME is a vector and geneset$GENES is a list of vectors.

First step during SLEA() is SLEA.clean_genesets(). This function expects to manage a list whose vectors contain gene names as names of each element of the vector (i.e. gene names from names(geneset$GENES[[TF]]))). However, lapply(geneset$GENES,names) leads to NULLs because the different vectors are unnamed. Because of that, no regulons are kept for the analysis.

> load("./data/CTFRs_v122016.rdata")
> source("./src/lib_enrichment_scores.r")
> all(unlist(lapply(geneset$GENES,function(x) is.null(names(x)))))
[1] TRUE
# This is the source of the following issue:
> SLEA.clean_genesets(genesets=geneset, E=exprs(eSet))
Removing targets under more than 10  TF
     0  targets keept
     0  targets removed
Filtering genesets: removing targets not in the expression matrix
Removing gene sets with less than  3  genes
     793  gene sets removed
     0  gene sets used covering  0  genes in the expression matrix
$NAME
character(0)

$GENES
named list()

If we named them, SLEA.clean_genesets() works well. Then, some of the methods finish properly. For instance, for GSVA:

# To name each vector from the list
> geneset2 <- geneset
> geneset2$GENES <- lapply(geneset$GENES,function(x) setNames(x,x))
# Test if it works out with this new 'geneset2'
> SLEA.mat <- SLEA(E=exprs(eSet),genesets=geneset2,method="GSVA",M = NULL, permutations = 1000, filter_E = F)
Removing targets under more than 10  TF
     7978  targets keept
     18  targets removed
Filtering genesets: removing targets not in the expression matrix
Removing gene sets with less than  3  genes
     667  gene sets removed
     126  gene sets used covering  6124  genes in the expression matrix
Getting Enrichment Scores
Calculating SLEA scores using GSVA

Done!

However VIPER does not work yet, it seems that it expects geneset$GENES was a list of numeric vectors with the gene names as names of the vector:

> SLEA.mat <- SLEA(E=exprs(eSet),genesets=geneset2,method="VIPER",M = NULL, permutations = 1000, filter_E = F)
Removing targets under more than 10  TF
     7978  targets keept
     18  targets removed
Filtering genesets: removing targets not in the expression matrix
Removing gene sets with less than  3  genes
     667  gene sets removed
     126  gene sets used covering  6124  genes in the expression matrix
Getting Enrichment Scores
Calculating SLEA scores using VIPER
Error in x/length(x) : non-numeric argument to binary operator
Called from: matrix(x/length(x), nrow = 1, ncol = length(x))

Hope that it helps! Thank you very much for your attention :)

All the best, Javier

luzgaral commented 7 years ago

Dear Javier,

Thanks for the comment. You're right, we didn't update the TF regulons object with the correct format. This has been now fixed and an example code added. You can pull the object again.

Kind regards,

Luz