Help understanding the output of LDpred2

kathrynfreeman commented 2 years ago

Hey Florian,

I am a new PhD student and my lab previously used PRSice2 which I would prefer not to continue using considering LDpred2 is such a better method. I was able to follow your tutorial using the sample data, but I am genuinely confused about the final output. My lab always generated scores using a certain set of summary stats (ie. bipolar gwas), and then each individual would end up with a score that corresponded to their ID number that was easy to export and merge with our phenotypic data.

I have been confused about the output of LDpred2 for weeks now and really need to figure out how to use it soon so I can continue with this method moving forward instead of PRSice2. I understand this is not an issue with bigsnpr and am sure you are very busy but I would be SO thankful for any tips you could provide me with.

Thank you in advance, Kate

privefl commented 2 years ago

Can you be a bit more precise what you mean by "confused about the final output"?

You have best_beta_grid and beta_auto as the effect sizes. You multiply the genotype matrix G by these to get the corresponding pred, which is your final vector of PGS (for individuals in G).

kathrynfreeman commented 2 years ago

Sorry for the lack of clarity. When I analyze any of thepred vectors generated I receive a list of polygenic risk scores in one column and the scores are numbered 1 through 153 in another.

However within obj.bigSNP there are sample.IDs that I would need to be attached to the pred scores for analysis. I am confused about how to link the final pred score list to meaningful sample ids for analysis

privefl commented 2 years ago

These are in the same order as in the bigSNP object, that's all. So you can just list the sample IDs to the pred.

kathrynfreeman commented 2 years ago

Perfect! Thank you for your help!

privefl / bigsnpr

Help understanding the output of LDpred2 #362