Closed JanHammarlund closed 8 months ago
@lvclark
One of the pre-processing steps is to normalize the data
S_{i,j}=\frac{X_{i,j}-M_i}{M_i}.\quad\quad (f1)
When ':norm_disc => false,' (by default 'false') the value of ':norm_disc_cov' does not matter, and $M_i
$ is the mean expression of probe $i
$ across all samples. However, when ':norm_disc => true,' the value of ':norm_disc_cov' determines the covariate used to group the samples to calculate $M_i
$. The expression of probe $i
$ is normalized within the covariate groups of covariate ':norm_disc_cov.'
To determine the appropriate number of eigengenes for training, we consider the amount of contributed variance of a single eigengene, the total captured variance of all eigengenes, and the amount of variance explained within an eigengene by a covariate. ':eigen_reg_disc_cov' determines which covariate is used to calculate the amount of variance of an eigengene explained by this covariate. Usually, I set this to be the covariate with the greatest difference between groups, in your case, likely the batch effect covariate.
While some covariates may be appropriate for pre-processing, they may not be appropriate for training. At this step, the user can include any, all, or none of the covariates in the dataset.
':out_covariates' determines if any covariates are included in the eigengene data for training, both continuous and discontinuous, if present, and by default is 'true.'
If ':out_covariates => true,' then ':out_use_disc_cov' determines whether or not discontinuous covariates are included in the eigengene data for training (by default 'true').
If ':out_use_disc_cov => true,' then ':out_all_disc_cov' specifies that all the discontinuous covariates found in the dataset are included with the eigengene data for training (by default 'true').
Only if ':out_all_disc_cov => false' is ':out_disc_cov' used to determine specifically which discontinuous covariate is included in the eigengene data for training.
By default, you don't need to specify which covariates you would like to include if you want to include all covariates for training.
':align_disc_cov' is an oversight on my part. This variable is no longer being used and can safely be removed from the training dictionary; its presence will not influence the algorithm.
Thank you for the explanation!
I couldn't find any covariate effects in the output. Is there a way to know, for example, the effect of each covariate on the sine and cosine coefficients for genes?
(Or maybe Fit_Average_1 and Fit_Average_2 are effects of the covariates on the average?)
(Or maybe Fit_Average_1 and Fit_Average_2 are effects of the covariates on the average?)
The "Cosine Fit" file does not include covariate effects for the sine and cosine coefficients. As you correctly pointed out, the Fit_Average_1 and Fit_Average_2 are the covariate effects on the expression average. Please note that "Fit_Average" is the average for condition one, and "Fit_Average_1" and "Fit_Average_2" are the offsets of conditions two and three relative to condition one.
I have one discontinuous covariate that is a batch effect, and one discontinuous covariate that is an effect of interest. I am a little confused about which should go into
norm_disc_cov
vs.eigen_reg_disc_cov
vs.out_disc_cov
vs.align_disc_cov
.Originally posted by @lvclark in https://github.com/ranafi/CYCLOPS-2.0/issues/1#issuecomment-1978969042