lukfor / pgs-calc

Applying polygenic scores (PGS) on imputed genotypes
MIT License
25 stars 4 forks source link

How to obtain risk scores for UKBB WES data? #8

Closed Hoeze closed 2 years ago

Hoeze commented 2 years ago

Hi, I am trying to obtain risk scores for different traits for every individual in the UK Biobank. The whole exome sequencing data we have looks like this:

   - /s/raw/ukbiobank/WES_200K/ukb23156_c10_b0_v1.vcf.gz 
   - /s/raw/ukbiobank/WES_200K/ukb23156_c10_b10_v1.vcf.gz
   - /s/raw/ukbiobank/WES_200K/ukb23156_c10_b11_v1.vcf.gz
   - /s/raw/ukbiobank/WES_200K/ukb23156_c10_b12_v1.vcf.gz
   - /s/raw/ukbiobank/WES_200K/ukb23156_c10_b13_v1.vcf.gz
[...]

Now I would run your tool like this: ./pgs-calc apply --ref PGS000018 --out PGS000018.scores.txt /s/raw/ukbiobank/WES_200K/*.vcf.gz --report-html PGS000018.html

lukfor commented 2 years ago

Hi,

Is this command correct?

Since you are using data from whole exome sequencing you need to set --genotypes GT. Per default pgs-calc uses the dosages (DS) to calculate the score.

In your README, you write that you need one file per chromosome. Will this also work with multiple files per chromosome as I have?

yes, pgs-calc works also with multiple files per chromosome. pgs-calc fails only when multiple chromosomes are present in one file.

Will there be large differences compared to imputed genotypes?

I don't think so. In the html report you see the coverage of each score (i.e. the number of variants found in your data).

Hoeze commented 2 years ago

Thanks a lot for your answer, this was very helpful!