Open h1hui opened 1 year ago
Hi Ashley,
This error means that some columns in your design matrix have no variation (i.e., all cells share the same value) besides the intercept. This makes the intercept term non-identifiable. You can identify these columns using e.g., which(apply(df,2,sd)==0), where df is your design matrix. You should keep the first column, which is the intercept, and remove the other columns.
Best regards, Liang
On Mon, Aug 7, 2023 at 1:12 AM Ashley Hui @.***> wrote:
Hello,
I have encountered this error when running nebula function on one of my datasets.
Error in nebula(count, sort$sample, pred = df, ncore = 2) : Some predictors have zero variation or a zero vector.
Also I found out that some numeric predictors were taken in as boolean when constructing model matrix: [image: Screen Shot 2023-08-06 at 10 09 04 PM] https://user-images.githubusercontent.com/33336127/258705085-1c849610-8843-4d6c-8aa0-d9a53aaea724.png However, I ran everything successfully on another dataset, for which model matrix was made correctly: [image: Screen Shot 2023-08-06 at 10 10 30 PM] https://user-images.githubusercontent.com/33336127/258705249-8675a7a6-dae8-4d1c-be1a-c47fc08b4ad5.png
What's the reason behind this? How can I fix that for the first dataset mentioned?
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/32, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUU7F32WWJZZI6GQ5VDXUB2NXANCNFSM6AAAAAA3GOCCNU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you Liang.
I was able to get rid of the no-variation columns, and ran nebula on the first dataset. However, the age and well columns were still made into several boolean columns, instead of one numeric column. And for the result, I got a lot of NaN's for se, and many 1's for p value. I am a little bit worried, would the result be the same as they were numeric? How can I do it correctly? I tried to do some troubleshooting comparing to the second dataset I have (the correct one), I don't see any different between two meta.data matrices.
Thanks, Hui
Hi Ashley,
I'm not sure whether the following answers your question. In your first design matrix, age is treated as a categorical variable and is thus coded by multiple dummy variables. A categorical variable is different from an ordinal variable in the sense that there is no order between two levels. In your case, age is ordinal and thus is better to be treated as a numerical variable. The function model.matrix by default converts a factor/character variable into 0/1 dummy variables. So, probably, the age variable in your first dataset is a factor or character variable. You might first run as.numeric(as.character(age)) to make it as a numerical variable before running model.matrix.
Best regards, Liang
On Mon, Aug 7, 2023 at 6:14 PM Ashley Hui @.***> wrote:
Thank you Liang.
I was able to get rid of the no-variation columns, and ran nebula on the first dataset. However, the age and well columns were still made into several boolean columns, instead of one numeric column. And for the result, I got a lot of NaN's for se, and many 1's for p value. I am a little bit worried, would the result be the same as they were numeric? How can I do it correctly? I tried to do some troubleshooting comparing to the second dataset I have (the correct one), I don't see any different between two meta.data matrices.
Thanks, Hui
— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/32#issuecomment-1668644878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUQGAD2AUM5RFQRCGYLXUFSDHANCNFSM6AAAAAA3GOCCNU . You are receiving this because you commented.Message ID: @.***>
Hello,
I have encountered this error when running nebula function on one of my datasets.
Also I found out that some numeric predictors were taken in as boolean when constructing model matrix:
However, I ran everything successfully on another dataset, for which model matrix was made correctly:
What's the reason behind this? How can I fix that for the first dataset mentioned?
Thanks!