privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

bed_projectPCA pre processing #503

Closed RoniHaas closed 3 months ago

RoniHaas commented 3 months ago

Hello,

Is there a guideline for recommended preprocessing steps before running bed_projectPCA (e.g. MAF threshold and LD pruning).

Thank you!

privefl commented 3 months ago

Hi, there are parameters of the function for this I guess. Have a look at the tutorial on this.

RoniHaas commented 3 months ago

Hi, there are parameters of the function for this I guess.

Thank you for your reply! Sorry if I missed it, but are there parameters of the function to control for LD?

Have a look at the tutorial on this.

I have looked at the tutorial and identified helpful explanations like "Why clumping should be preferred over pruning ", but also parts where pruning is recommended "Always use both pruning and removing of long-range LD regions". I therefore wasn't sure what is best in case using specifically bed_projectPCA, and did not find a tutorial on this function. If there is a tutorial for using this function, I would appreciate it if you could point me to it. Thanks!

privefl commented 3 months ago
RoniHaas commented 3 months ago

There are also two R scripts from the paper using this function here. Would it be useful to add a link to these scripts in the documentation of the function?

Thank you for this suggestion! I don't think it would be necessary. I found how the function is being used in one of the files, and it is pretty basic with a default setting. For me, it would have been helpful if more description was added to the function documentation for simplicity. Although, as you mentioned, it is possible to understand more by reading the bed_autoSVD() function via ....