martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
110 stars 15 forks source link

Ordinal covariates #20

Closed rbutleriii closed 2 years ago

rbutleriii commented 2 years ago

For categorical covariates (i.e. batch), should you numerically encode them in the covariates file? Will they be handled as is if they are character strings?

martinjzhang commented 2 years ago

Hi,

scDRS only accept numerical covariates. Please use dummy (0/1) variables for categorical variables. If there are K categories, include K-1 dummies and also another all 1 column.

The current version of scDRS doesn't have a formal treatment for ordinal variables. You can either convert an ordinal variable to a numerical variable or several dummy variables.

The scDRS results are not sensitive to different choices of covariates unless the covariates only affect a subset of genes.

Best, Martin

rbutleriii commented 2 years ago

Got it, thanks! In case anyone is using R, the library(dummies) is of use, although it produces all K variables, by default:

  a$const = 1
  a = cbind(a, dummy("batch", data=a)[,-1])
  a[, batch := NULL]