xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
21 stars 17 forks source link

Fitting NULL model for binary outcomes #50

Closed GACGAMA closed 3 months ago

GACGAMA commented 3 months ago

Hello! I'm trying to fit my null_model (with genesis and STAAR). This is my phenotype:

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

sex | diagnosis_phenotype | ID -- | -- | -- f | Disease | 1 f | Disease | 2 f | Control | 3 m | Disease | 4 m | Disease | 5 f | Control | 6 f | Control | 7 m | Control | 8 f | Control | 9 m | Control | 10 f | Control | 11

When using:

fit_nullmodel(as.factor(diagnosis_phenotype)~as.factor(sex),
                               data=phenotype,kins=sgrm,use_sparse=TRUE,kins_cutoff=0.022,id="ID",
                               family=gaussian(link="identity"),verbose=TRUE)

I get

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
  NA/NaN/Inf in 'y'
In addition: Warning messages:
1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors
3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

With GENESIS, i get:


fitNullModel(data_GENESIS,outcome="diagnosis_phenotype",
+                                       cov.mat=sgrm,AIREML.tol=1e-4,verbose=TRUE, family = "binomial")
Error in svd(x, 0, 0) : a dimension is zero

How can I generate the null model for binary/categorical outcomes?

xihaoli commented 3 months ago

Hi @GACGAMA,

Thanks for reaching out. When fitting the null model for binary outcomes, please use family=binomial(link="logit") instead of family=gaussian(link="identity"). Also, it is generally recommended to code your phenotype variable as 0/1 numerical values for control/case status instead of factor values.

Best, Xihao

GACGAMA commented 3 months ago

Converting the outcome to 0/1 and using binomial(link="logit") did work, thanks Xihao! For some covariables, I get the following error with the STAAR fitnull:

Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'chol2inv': LAPACK routine 'dpotrf': leading principal minor of order 4 is not positive

Do you have any suggestions for this?

xihaoli commented 3 months ago

Could I ask what you mean by covariables? Are they "outcomes" or "covariates"?

GACGAMA commented 3 months ago

Sorry, I meant covariates. I think the error means that my covariates don't have enough variance for the model to converge or something like that. My dataset is pretty small (60 WGS samples).

xihaoli commented 3 months ago

Hi @GACGAMA,

Thanks for clarifying. Yes, you are correct. Generally, the null model fitting step is more prone to convergence issues for very small datasets, since the number of observations (e.g. n = 60) may not be sufficiently larger than the number of parameters in the model.

Best, Xihao