`norm_disc_cov` vs. `eigen_reg_disc_cov` vs. `out_disc_cov` vs. `align_disc_cov`

JanHammarlund commented 8 months ago

I have one discontinuous covariate that is a batch effect, and one discontinuous covariate that is an effect of interest. I am a little confused about which should go into norm_disc_cov vs. eigen_reg_disc_cov vs. out_disc_cov vs. align_disc_cov.

Originally posted by @lvclark in https://github.com/ranafi/CYCLOPS-2.0/issues/1#issuecomment-1978969042

JanHammarlund commented 8 months ago

@lvclark

norm_disc_cov and norm_disc

One of the pre-processing steps is to normalize the data

S_{i,j}=\frac{X_{i,j}-M_i}{M_i}.\quad\quad (f1)

When ':norm_disc => false,' (by default 'false') the value of ':norm_disc_cov' does not matter, and $M_i$ is the mean expression of probe $i$ across all samples. However, when ':norm_disc => true,' the value of ':norm_disc_cov' determines the covariate used to group the samples to calculate $M_i$. The expression of probe $i$ is normalized within the covariate groups of covariate ':norm_disc_cov.'

eigen_reg_disc_cov

To determine the appropriate number of eigengenes for training, we consider the amount of contributed variance of a single eigengene, the total captured variance of all eigengenes, and the amount of variance explained within an eigengene by a covariate. ':eigen_reg_disc_cov' determines which covariate is used to calculate the amount of variance of an eigengene explained by this covariate. Usually, I set this to be the covariate with the greatest difference between groups, in your case, likely the batch effect covariate.

out_disc_cov, out_covariates, out_use_disc_cov, and out_all_disc_cov

While some covariates may be appropriate for pre-processing, they may not be appropriate for training. At this step, the user can include any, all, or none of the covariates in the dataset.

':out_covariates' determines if any covariates are included in the eigengene data for training, both continuous and discontinuous, if present, and by default is 'true.'

If ':out_covariates => true,' then ':out_use_disc_cov' determines whether or not discontinuous covariates are included in the eigengene data for training (by default 'true').

If ':out_use_disc_cov => true,' then ':out_all_disc_cov' specifies that all the discontinuous covariates found in the dataset are included with the eigengene data for training (by default 'true').

Only if ':out_all_disc_cov => false' is ':out_disc_cov' used to determine specifically which discontinuous covariate is included in the eigengene data for training.

By default, you don't need to specify which covariates you would like to include if you want to include all covariates for training.

align_disc_cov

':align_disc_cov' is an oversight on my part. This variable is no longer being used and can safely be removed from the training dictionary; its presence will not influence the algorithm.

lvclark commented 8 months ago

Thank you for the explanation!

lvclark commented 7 months ago

I couldn't find any covariate effects in the output. Is there a way to know, for example, the effect of each covariate on the sine and cosine coefficients for genes?

lvclark commented 7 months ago

(Or maybe Fit_Average_1 and Fit_Average_2 are effects of the covariates on the average?)

JanHammarlund commented 7 months ago

(Or maybe Fit_Average_1 and Fit_Average_2 are effects of the covariates on the average?)

The "Cosine Fit" file does not include covariate effects for the sine and cosine coefficients. As you correctly pointed out, the Fit_Average_1 and Fit_Average_2 are the covariate effects on the expression average. Please note that "Fit_Average" is the average for condition one, and "Fit_Average_1" and "Fit_Average_2" are the offsets of conditions two and three relative to condition one.

ranafi / CYCLOPS-2.0