xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
24 stars 17 forks source link

Questions/issues in "Step 1: Fit STAAR null model" #13

Closed daniel-hui closed 1 year ago

daniel-hui commented 1 year ago

Hi Xihao,

Thanks for helping me out earlier. I had a couple questions/issues when running "Step 1: Fit STAAR null model":

  1. If I try running STAARpipeline_Null_Model.r, I get an error:
Error in glmmkin(fixed = fixed, data = data, kins = kins, id = id, random.slope = random.slope,  :
  Error: "id" must be one of the variables in the names of "data".
Calls: fit_nullmodel -> glmmkin
Execution halted

The error seems to be in the line:

obj_nullmodel <- fit_nullmodel(BMI_IRNT~Age+AgeSq+Subject_Information.Sex..str+PC1+PC2,data=phenotype,kins=sgrm,use_sparse=TRUE,kins_cutoff=0.022,id="IID",family=gaussian(link="identity"),verbose=TRUE)

for the field 'id="IID"'. However "IID" (no quotes) is the column name that the sample IDs are in. I tried running the script without quotes for "IID" and it doesn't work either. Would you know what the issue is?

  1. I then tried using STAARpipeline_Null_Model_GENESIS.r and it gave an error:
Error in .checkSampleId(cov.mat, x) :
  all sample names in dimnames of cov.mat must be present in x$sample.id
Calls: fitNullModel -> fitNullModel -> .local -> .checkSampleId
Execution halted

I saw in another issue this was caused by the IDs being converted to something like "ID_ID" -- I tried the fix in the other issue but still had problem. However, I do have samples in the sparse GRM that are not in the phenotype file -- we will probably run ~20 phenotypes which will have different numbers of individuals with available phenotypes. I suppose it would be preferable to just make one sparse GRM for all phenotypes, but it may not be too much more effort to make a new sparse GRM for each phenotype. Do you have any recommendation here? Thanks.

Daniel

daniel-hui commented 1 year ago

Sorry my last post may have been a little early, but I am now able to run STAARpipeline_Null_Model_GENESIS.r without error (only using individuals with both genetics and phenotypes for the sGRM, and I had to change the ID column in the phenotype file to "sample.id"). However, if you are aware of any workarounds to not have to make a separate sGRM for different sets of individuals it would be convenient/appreciated, thanks.

xihaoli commented 1 year ago

Hi Daniel,

No problem. I have two general comments for you to consider:

  1. The common strategy is to generate a sparse GRM for all subjects in the study. Then, when you fit the null model for different phenotypes, it is OK to subset the sGRM to phenotype-specific sub-matrices. In this way, you only need to generate sGRM once.

  2. You mentioned that you could run STAARpipeline_Null_Model_GENESIS.r without error. You may now consider running STAARpipeline_Null_Model.r for your null model fitting as these two scripts share the same statistical framework and you don't need to convert the null model object using genesis2staar_nullmodel.R.

Best, Xihao