xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
21 stars 17 forks source link

Error in fit STAAR null model #37

Closed zmli0831 closed 7 months ago

zmli0831 commented 8 months ago

hello,may I ask why does this error occur

obj_nullmodel <- fit_nullmodel(V6~age+sex+V5,

  • data=phenotype,kins=as.matrix(sgrm),use_sparse=TRUE,kins_cutoff=0.022,id="V1",
  • family=gaussian(link="identity"),verbose=TRUE) [1] "kins is a dense matrix, transforming it into a sparse matrix using cutoff 0.022." Using 500 samples provided Identifying clusters of relatives... 13 relatives in 2 clusters; largest cluster = 9 Creating block matrices for clusters... 487 samples with no relatives included Putting all samples together into one block diagonal matrix Error in glmmkin(fixed = fixed, data = data, kins = kins_sp, id = id, : Error: kins matrix 1 does not include all individuals in the data.
zmli0831 commented 8 months ago

obj_nullmodel <- fit_nullmodel(lung_ca~age+sex+typesmok+pca1+pca2+pca3,

xihaoli commented 8 months ago

Hi @zmli0831, could I ask how did you generate your sgrm?

zmli0831 commented 8 months ago

Thank you very much for your reply! The sgrm matrix was generated by using the procedures as follow:

/home/zmli/plink/plink --bfile /home/zmli/software/STAAR/Example/Example --indep-pairwise 50 5 0.1 --out /home/zmli/software/STAAR/Example/chrall.prunedlist; /home/zmli/plink/plink --bfile /home/zmli/software/STAAR/Example/Example --extract /home/zmli/software/STAAR/Example/chrall.prunedlist.prune.in --make-bed --out /home/zmli/software/STAAR/Example/chrall_pruned /home/zmli/plink/plink --bfile /home/zmli/software/STAAR/Example/chrall_pruned --mac 5 --make-bed --out /home/zmli/software/STAAR/Example/chrall_pruned_mac5

/home/zmli/software/king/king -b /home/zmli/software/STAAR/Example/Example.bed --ibdseg --degree 4 --cpus 2 --prefix /home/zmli/software/STAAR/Example/output.king

R CMD BATCH --vanilla '--args --prefix.in /home/zmli/software/STAAR/Example/chrall_pruned_mac5 --file.seg /home/zmli/software/STAAR/Example/output.king.seg --num_threads 2 --prefix.out Exampleoutput.divergence' /home/zmli/software/STAAR/FastSparseGRM-main/extdata/getDivergence_wrapper.R getDivergence.Rout

R CMD BATCH --vanilla '--args --prefix.in /home/zmli/software/STAAR/Example/chrall_pruned_mac5 --file.seg /home/zmli/software/STAAR/Example/output.king.seg --file.div /home/zmli/software/STAAR/Example/Exampleoutput.divergence.div --prefix.out Exampleoutput.unrelated' /home/zmli/software/STAAR/FastSparseGRM-main/extdata/extractUnrelated_wrapper.R extractUnrelated.Rout

R CMD BATCH --vanilla '--args --prefix.in /home/zmli/software/STAAR/Example/chrall_pruned_mac5 --file.unrels /home/zmli/software/STAAR/Example/Exampleoutput.unrelated.unrels --prefix.out /home/zmli/software/STAAR/Example/Exampleoutput.pca --num_threads 10' /home/zmli/software/STAAR/FastSparseGRM-main/extdata/runPCA_wrapper.R runPCA.Rout

R CMD BATCH --vanilla '--args --prefix.in /home/zmli/software/STAAR/Example/chrall_pruned_mac5 --prefix.out /home/zmli/software/STAAR/Example/Exampleoutput.sparseGRM --file.train /home/zmli/software/STAAR/Example/Exampleoutput.unrelated.unrels --file.score /home/zmli/software/STAAR/Example/Exampleoutput.pca.score --file.seg /home/zmli/software/STAAR/Example/output.king.seg --block.size 500 --max.related.block 500 --num_threads 10' /home/zmli/software/STAAR/FastSparseGRM-main/extdata/calcSparseGRM_wrapper.R calcSparseGRM.Rout

xihaoli commented 7 months ago

Hi @zmli0831,

Thanks for following up. Based on your scripts, I can see two issues here:

(1) The output of FastSparseGRM is already a sparse matrix object, and STAAR supports sparse GRM. Thus, in the null model fitting script, it is unnecessary to use kins=as.matrix(sgrm) as it will then convert the GRM back to a dense matrix, which is computationally expensive. This could be the reason that caused

[1] "kins is a dense matrix, transforming it into a sparse matrix using cutoff 0.022."
Using 39408 samples provided
double free or corruption (!prev)
Aborted (core dumped)

Instead, you can just use kins <- sgrm for null model fitting.

(2) In null model fitting, subjects in the phenotype data determine the final sample size. The sample ids (rownames and colnames) in the sparse GRM should be the same as phenotype or a superset, but not a subset. In your output, your phenotype data include samples that are not in your GRM file, which could the be the reason that caused

Error in glmmkin(fixed = fixed, data = data, kins = kins_sp, id = id, :
Error: kins matrix 1 does not include all individuals in the data.

Hope this helps, and please feel free to close this issue if it has been resolved.

Best, Xihao

zmli0831 commented 7 months ago

Thank you very much for your reply, after your reminder, the codes works!!

xihaoli commented 7 months ago

Hi @zmli0831,

Thank you so much for letting me know and great to hear. I shall close this issue for now. Please feel free to open a new issue if you have any other questions.

Best, Xihao