saezlab / decoupleR

R package to infer biological activities from omics data using a collection of methods.
https://saezlab.github.io/decoupleR/
GNU General Public License v3.0
183 stars 24 forks source link

Discrepancy TF-Target database downloaded from DoRothEA and decoupleR #68

Closed harimchun closed 1 year ago

harimchun commented 1 year ago

Hi, I am using decoupleR with my scRNA-seq data.

While inferring transcription factor activities using the DoRothEA, I found discrepancies in the database size.

If I download TF-Target information (Confidence level: A, B, C, D) using the DoRothEA library using the R code below,

net_dorothea <- dorothea::dorothea_hs
net_dorothea %>% filter(confidence %in% c('A', 'B', 'C', 'D')) %>% pull(tf) %>% unique() %>% length()

I can get 361 unique TF information.

However, when I download TF-Target information (Confidence level: A, B, C, D) using the decoupleR library using the R code below,

net_decoupler <- decoupleR::get_dorothea(organism='human', levels=c('A', 'B', 'C', 'D'))
net_decoupler$source %>% unique() %>% length()

I can get 643 unique TF information.

Also, column names are different between the two databases. I don't know why these discrepancies occurred.

In this case, which database should I use to infer the transcription factor activities?

PauBadiaM commented 1 year ago

Hi @harimchun

There are currently two versions of DoRothEA. In the original package we store the old one (dorothea::dorothea_hs) while in OmniPath we store the new one (which we call from decoupleR with decoupleR::get_dorothea).

We recommend using the new one through decoupleR. In the future release of BioC we are going to update the dorothea package so that the two versions match.

harimchun commented 1 year ago

Thanks! I would use decoupleR::get_dorothea

Have a great day.