songw01 / MEGENA

Multiscale embedded gene co-expression network analysis
GNU General Public License v3.0
48 stars 16 forks source link

Should the experssion data be log-transformed for the co-expression analysis? #5

Closed liu-ying-jun closed 3 years ago

liu-ying-jun commented 3 years ago

Hi I am using MEGENA for co-expression analysis. the package is very cool. My gene expression data is in CPM format. I tried to do the co-expression analysis with both cpm values and log2(cpm +1) values. unexpectly, the results were completely different with the two approaches. I am wondering which values should be used for MEGENA? the cpm or log2(cpm +1)? thanks.

songw01 commented 3 years ago

You should run on log-transformed data.

liu-ying-jun commented 3 years ago

I see. thanks a lot. for the co-expression analysis, would you recommend to run it only on the significant gene list (say FDR < 0.05) or on all genes that have expression values in the dataset?

songw01 commented 3 years ago

You should run on all genes that have substantial variances across the samples (e.g. coefficient of variation > 0.1 is a widely used threshold).

liu-ying-jun commented 3 years ago

thanks. the cv should be calculated before or after the log transformation?

songw01 commented 3 years ago

You are assuming normal distribution when calculating CV. Thus, you should do it after log transformation.