Closed andreyurch closed 4 years ago
Dear @andreyurch,
thanks for your interest in our package.
Please note that the dorothea
package is an experimental data package, with the main purpose to provide the TF-target interaction database (regulons). For convenience, we also developed a wrapper for the statistical method viper
. However, dorothea
is not limited to work only with viper
, but can be used with any statistical method that aims to analyse gene sets. Hence, the format of the gene expression matrix is only dependent on the underlying statistic and not on dorothea's regulons.
If you would like to use our wrapper for viper
I would suggest to use log normalised counts (e.g. logCPM). Optionally, you can also scale the data gene wise by setting vipers method
argument to scale
Also I strongly recommend to filter out lowly expressed genes. This step should be performed regardless of whether your annotation covers only 10,000 or up to 50,000 genes.
Best wishes, Christian
Dear developers,
What is the optimal format for Dorothea (bulk transcriptome analysis)? Is this FPKM, normalised counts, log normalised counts?? In the modern annotations, we have up to 50000 genes and many of them are not expressed. Should I filter the low expressed genes before the analysis?