Closed Hrovatin closed 3 years ago
Never-mind, I have just realised that doing constraints in this way produces design matrix that is not full rank.
Also, doing what I proposed would likely require optimising a parreto front of both coef_study_bin within bin across studies and within study across bins. (Not sure about this)
I have the following problem: Studies (each composed of multiple samples): Continuous process P: Distribution of P across samples.
If I fit only ~1+P I get genes that may be expressed in only part of the studies in P higj/low region. Image of top downregulated genes sorting by lfc and then padj. All but the 2nd and 9th gene seem to be such examples.
As studies/samples confound with P I can not use those simply as covariates. Thus I was thinking of binning P into lets say 10 bins and constraining studies within each bin. However, I will have cells from the same study in different bins. So if the same base level was used in each bin for constraints then the constraint coefficients for the same study across bins should be similar. Is there a way to enforce this?