Closed ndimou closed 10 months ago
I guess there are two things going on here:
bed_autoSVD()
(if you're using this function) automatically handles the removal of LD to capture only population structure, so that the two decomposition would not use the same set of variantsIn conclusion, you should really use bed_autoSVD()
over PLINK :')
Thanks Florian. I removed variants in LD and forced bed_autoSVD() not to do any further prunning to make sure plink/bignpr number of variants are the same. However, the difference I get is quite substancial like 0.012 in plink and 16.51 in bigsnpr for a given PC/sample. Is it a transformation needed in the ".eigenvec" file I get from plink that could mirror what you are getting from the "predict" option in bigsnpr?
Thanks!
Ah, you're talking about that difference..
I guess what you get from the ".eigenvec" file corresponds to obj.svd$u
.
PC scores are actually UD (not just U) in the UDVt decomposition, and it is what is reported when you use predict()
.
Thanks. I checked ".eigenvec" file is "similar" (at least some scale) with obj.svd$u. Then going back to my original question which should I use as a covariate in my GWAS? I see eigenvec are used in previous GWAS.
You should use autoSVD to make sure there is no LD left in PCs.
LD is accounted for. Let me put it in another way. Is obj.svd$u OR PC_init <- predict(obj.svd_init) you would use as a covariate in the GWAS?
I don't think it makes a difference to use U or UD as covariates (because the scale of covariates does not matter in an unpenalized regression).
Great! Thank you for your help.
Hello,
My objective is to calculate PCs that I can use to adjust my GWAS. I computed the first 20 PCs following all the steps available here https://privefl.github.io/bigsnpr/articles/bedpca.html (using the predict option) and using the --pca option in plink (eigenevec values from plink) and I get completely different estimates. I see people simply use eigenvec as adjustment factors and I was wondering which is the way to go?
Thank you Niki