tshmak / lassosum

LASSO for GWAS with summary statistics
MIT License
71 stars 25 forks source link

Lassosum benchmarking #38

Open fmadani opened 2 years ago

fmadani commented 2 years ago

Hi @privefl , @tshmak

I am benchmarking PRS/PGS methods practiced in the PRS tutorial. Regard to Lassosum, I got some confusions. I'd appreciate if you clarify them. Here they are:

1- In the paper (Polygenic scores via penalized regression on summary statistics) and the introduction of the lassosum repository, it is mentioned that 'reference panel' is deployed to use its LD information. In the exercise provided in the tutorial, ref.bfile is literally the bed file of the target dataset (EUR.QC.bed in the Height case), while it is expected ref.bfile refers to a reference panel file like HapMap3. Please clarify this confusion.

2- In the tutorial, fam variable is calculated via these codes: fam <- fread(paste0(bfile, ".fam")) fam[,ID:=do.call(paste, c(.SD, sep=":")),.SDcols=c(1:2)]

but, there is no application for fam. Please clarify.

3- Why there is no 'SNPs matching' like what we have in LDpred-2?

Thank you, Farshad

tshmak commented 2 years ago
  1. In theory it's best to use a LD panel that matches the population of the summary stats. However, that's often not available or doesn't have a large enough sample size. Hence using the test (target) bfile as the LD panel is often done.
  2. Please ask Sam
  3. There is. Please see my other reply.
fmadani commented 2 years ago

Thank you Timothy. Last two quick questions: 1- Why HapMap3 is not used instead of test (target) bfile to generate LD panel?

2- I am looking for to understand more deeply the algorithm used in the Lassosum. I drew a diagram based on the lassosum code. Here is the diagram:

PRS-Lassosum drawio

I'd like to drill down to "lassosum modelling" box and to figure out what the main steps are taken in it. Would you please help me out and let me know what the main steps of the lassosum algorithm are? Any reference or source to study would be appreciated. (FYI: I read your paper published, but it more explain the math part. I need to know general steps taken in the lassosum library. I reviewed the codes, but I couldn't find out the steps.)

Regards,

tshmak commented 2 years ago
  1. I just want to clarify that the tutorial is written by Sam and not me. In principle I don't recommend any reference panel over another, and it's the responsibility of the user to decide what to use. The reason I used 1000 Genome in my paper was because it was the best public domain reference at the time.
  2. I am not sure I understand your diagram completely, but in lassosum, filtering is done together with the summary stats, i.e. test.bfile, ref.bfile, and summary stats together.
  3. I'm not sure exactly what you want. However, I think there's no other way to understand the algorithm other than reading the source code.

Best, Tim