lhe17 / nebula

GNU General Public License v2.0
28 stars 6 forks source link

Getting Error "Some predictors have zero variation or a zero vector" #32

Open h1hui opened 1 year ago

h1hui commented 1 year ago

Hello,

I have encountered this error when running nebula function on one of my datasets.

Error in nebula(count, sort$sample, pred = df, ncore = 2) : 
  Some predictors have zero variation or a zero vector.

Also I found out that some numeric predictors were taken in as boolean when constructing model matrix:

Screen Shot 2023-08-06 at 10 09 04 PM

However, I ran everything successfully on another dataset, for which model matrix was made correctly:

Screen Shot 2023-08-06 at 10 10 30 PM

What's the reason behind this? How can I fix that for the first dataset mentioned?

Thanks!

lhe17 commented 1 year ago

Hi Ashley,

This error means that some columns in your design matrix have no variation (i.e., all cells share the same value) besides the intercept. This makes the intercept term non-identifiable. You can identify these columns using e.g., which(apply(df,2,sd)==0), where df is your design matrix. You should keep the first column, which is the intercept, and remove the other columns.

Best regards, Liang

On Mon, Aug 7, 2023 at 1:12 AM Ashley Hui @.***> wrote:

Hello,

I have encountered this error when running nebula function on one of my datasets.

Error in nebula(count, sort$sample, pred = df, ncore = 2) : Some predictors have zero variation or a zero vector.

Also I found out that some numeric predictors were taken in as boolean when constructing model matrix: [image: Screen Shot 2023-08-06 at 10 09 04 PM] https://user-images.githubusercontent.com/33336127/258705085-1c849610-8843-4d6c-8aa0-d9a53aaea724.png However, I ran everything successfully on another dataset, for which model matrix was made correctly: [image: Screen Shot 2023-08-06 at 10 10 30 PM] https://user-images.githubusercontent.com/33336127/258705249-8675a7a6-dae8-4d1c-be1a-c47fc08b4ad5.png

What's the reason behind this? How can I fix that for the first dataset mentioned?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/32, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUU7F32WWJZZI6GQ5VDXUB2NXANCNFSM6AAAAAA3GOCCNU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

h1hui commented 1 year ago

Thank you Liang.

I was able to get rid of the no-variation columns, and ran nebula on the first dataset. However, the age and well columns were still made into several boolean columns, instead of one numeric column. And for the result, I got a lot of NaN's for se, and many 1's for p value. I am a little bit worried, would the result be the same as they were numeric? How can I do it correctly? I tried to do some troubleshooting comparing to the second dataset I have (the correct one), I don't see any different between two meta.data matrices.

Thanks, Hui

lhe17 commented 1 year ago

Hi Ashley,

I'm not sure whether the following answers your question. In your first design matrix, age is treated as a categorical variable and is thus coded by multiple dummy variables. A categorical variable is different from an ordinal variable in the sense that there is no order between two levels. In your case, age is ordinal and thus is better to be treated as a numerical variable. The function model.matrix by default converts a factor/character variable into 0/1 dummy variables. So, probably, the age variable in your first dataset is a factor or character variable. You might first run as.numeric(as.character(age)) to make it as a numerical variable before running model.matrix.

Best regards, Liang

On Mon, Aug 7, 2023 at 6:14 PM Ashley Hui @.***> wrote:

Thank you Liang.

I was able to get rid of the no-variation columns, and ran nebula on the first dataset. However, the age and well columns were still made into several boolean columns, instead of one numeric column. And for the result, I got a lot of NaN's for se, and many 1's for p value. I am a little bit worried, would the result be the same as they were numeric? How can I do it correctly? I tried to do some troubleshooting comparing to the second dataset I have (the correct one), I don't see any different between two meta.data matrices.

Thanks, Hui

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/32#issuecomment-1668644878, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUQGAD2AUM5RFQRCGYLXUFSDHANCNFSM6AAAAAA3GOCCNU . You are receiving this because you commented.Message ID: @.***>