Closed kimkh415 closed 1 year ago
Thank you for your interest in DAESC. DAESC automatically append an intercept column (1's across all cells) to the design matrix. That probably caused your design matrix to be rank deficient. Currently, I would suggest deleting one column in the design matrix. You could choose the (condition, layer) pair that is as baseline. Then the coefficients can be interpreted as difference compared to the baseline. I will update the software within a week or two to allow arbitrary design matrix.
There should not be any problem incorporating continuous covariates such as cell type abundance into the model. You can append the cell type abundance to the current design matrix.
Let me know if you have further questions!
Thank you very much for your reply. Will proceed as you suggested.
Hello DAESC dev team!
First of all, thank you for developing such flexible framework to test for allelic imbalance.
I am trying to incorporate DAESC into my analysis pipeline that tests for differential ASE across multiple individuals accounting for the disease status (condition), spatial location (cortical layers) and cell type.
To begin with, I applied DAESC on a toy dataset from one gene, two conditions, and two individuals per condition. Here, I fit the baseline model (DAESC-BB) specifying the design matrix
x
as a binary numeric array denoting the condition.Now, I want to extend this by also taking into account the spatial information and the interaction terms. Since both condition and spatial location as in cortical layers are categorical variables, I tried using
model.matrix
function to encode the information in one-hot matrix. Here is the structure of the data frame I am working with and how I am invoking thedaesc_bb
function.myformula = ~ cur.df$condition + cur.df$layer + cur.df$condition:cur.df$layer + 0
one_hot = model.matrix(myformula)
res = daesc_bb(y=cur.df$allele1_count, n=cur.df$total_count, subj=cur.df$sample_id, x=one_hot)
When I run this as is, I get the following error:
I think the reason is in the design matrix, where the first two columns have essentially the same information. After dropping the first column in
one_hot
, it runs without error, but I want to double check whether this is the intended use of this variable when supplying multiple categorical variables.Finally, I also want to add cell type information as another independent variable. In my case, cell type is not a categorical variable, since the data was generated using Visium. So for each cell type, the input data will be its estimated abundance. If I were to input all (1) condition, (2) layer and (3) cell type into
x
, how should I structure it?Thank you!