nkurniansyah / Hypertension_PRS

4 stars 0 forks source link

Can't build PRS using PRSice and summary statistics #1

Closed andrewmagis closed 8 months ago

andrewmagis commented 9 months ago

Hi - thanks for making these data and code available. I have installed PRSice and cloned the repository, Using the code snippet that you posted here, I tried to run PRSice. Howver, it seems the summary statistics files do not have a PValue column, and this is required by PRSice. Here is my command:

Rscript ./PRSice.R \ --dir ./PRS_Output \ --prsice ./PRSice_mac \ --base ./Summary_Statistics_for_PRS_construction/2022-04-20_HTN_PAN-UKBB.txt \ --target ./Genotype \ --thread 2 \ --chr-id Chromosome \ --bp Position \ --A1 Allele1 \ --A2 Allele2 \ --pvalue PValue \ --bar-levels 0.3 \ --stat BETA \ --all-score T \ --out ./out_prs \ --no-clump T --print-snp T \ --ignore-fid T \ --no-regress T \ --fastscore T \ --model add \ --no-full T \ --chr-id c:l:a:b PRSice 2.3.5 (2021-09-20) https://github.com/choishingwan/PRSice (C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly GNU General Public License v3 If you use PRSice in any published work, please cite: Choi SW, O'Reilly PF. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 8, no. 7 (July 1, 2019) 2023-11-25 22:16:23 ./PRSice_mac \ --a1 Allele1 \ --a2 Allele2 \ --all-score \ --bar-levels 0.3,1 \ --base ./Summary_Statistics_for_PRS_construction/2022-04-20_HTN_PAN-UKBB.txt \ --beta \ --binary-target F \ --bp Position \ --chr-id Chromosome \ --interval 5e-05 \ --lower 5e-08 \ --no-clump \ --num-auto 22 \ --out ./out_prs \ --pvalue PValue \ --seed 3002820798 \ --stat BETA \ --target ./Genotype \ --thread 2 \ --upper 0.5

Error: PValue not found in base file Error: Column for the P-value must be provided!

Error: PValue not found in base file Error: Column for the P-value must be provided!

Error: Execution halted zsh: command not found: --print-snp

tamartsi commented 9 months ago

@nkurniansyah can we fix that? this is an oversight on our part... we need to add the mock p-value column to use PRSice.

andrewmagis commented 9 months ago

Thanks - one other question about the Genotype input. It seems to require Plink files, but none are provided in the repository. Can these be provided as well?

I am trying to reproduce your PGS based on the instructions in this repository, but it would be easier to just get access to the one you created for the paper, if that can be made available in the PGS Catalog or via direct communication. (https://www.pgscatalog.org/).

nkurniansyah commented 9 months ago

Hi There,

Thank you for using our HTN-PRS. As we have already completed the clumping and thresholding steps, you only need to use a weight for the analysis. I have included a mock p-value in our summary statistics to facilitate this. With this addition, you should be able to generate PRS without facing any errors.

We are currently in the process of submitting this file to the PGS catalog and are awaiting their response.

Regarding the genotype data, we cannot provide it, as it comes from TOPMed and MGB biobank, which include confidential patient IDs that cannot be shared.

Please let me know if you encounter any errors.

andrewmagis commented 9 months ago

Thanks, that is good to know that the PGS is being submitted to PGS Catalog.

When I run the command using the updated files, I get an error that Genotype.fam is missing. Do I need to obtain the raw genotype files from TOPMed and MGB before I can build the PGS?

./PRSice_mac \ --a1 Allele1 \ --a2 Allele2 \ --all-score \ --bar-levels 1 \ --base ./Summary_Statistics_for_PRS_construction/2023-12-04_HTN_PAN-UKBB.txt \ --beta \ --binary-target F \ --bp Position \ --chr Chromosome \ --chr-id c:l:a:b \ --fastscore \ --ignore-fid \ --model add \ --no-clump \ --no-full \ --no-regress \ --num-auto 22 \ --out ./out_prs \ --print-snp \ --pvalue PValue \ --seed 1335787 \ --stat BETA \ --target ./Genotype \ --thread 2

Initializing Genotype file: ./Genotype (bed)

Start processing 2023-12-04_HTN_PAN-UKBB ==================================================

Base file: ./Summary_Statistics_for_PRS_construction/2023-12-04_HTN_PAN-UKBB.txt Header of file is: Chromosome snpID Position Allele1 Allele2 BETA PValue

Reading 100.00% 234228 variant(s) observed in base file, with: 234228 total variant(s) included from base file

Loading Genotype info from target ==================================================

Error: Cannot open file: ./Genotype.fam

Error: Execution halted

nkurniansyah commented 9 months ago

You need to provide genotype data to calculate PRS.

If you want to use resources like TOPMed and MGB, you need to follow the specific application process; we provide study-specific accession in the paper under data availability.

If you don't have access to any genotype data, an alternative is to utilize freely available genotype data, such as that from the 1000 Genomes Project. However, it's important to note that the sample sizes in these free datasets are smaller.

andrewmagis commented 9 months ago

Oh I understand - so the PRSice command is not doing some additional filtering/consolidation of the summary statistics to produce the final PRS - it is computing the PRS using the genotype data? Is command to use PRSice to output the final, consolidated summary statistics so I can compute the PRS myself?

tamartsi commented 8 months ago

Hi @andrewmagis, it seems like your question wasn't resolved. It is correct that the PRSice command does not do any additional filtering/consolidation. It is just useful to use a PRS software that checks alleles, etc. You can also use Plink. Also I hope it would help to know that the PRS are available on the PGS catalog: https://www.pgscatalog.org/publication/PGP000531/ and we included a single PRS (https://www.pgscatalog.org/score/PGS004236) that accounts for the summation of three separate PRSs.

andrewmagis commented 8 months ago

Thanks for making me aware of your deposition in the PGS catalog. I'll take a look now. Appreciate it!

samreenzafer commented 6 months ago

Hi @andrewmagis, it seems like your question wasn't resolved. It is correct that the PRSice command does not do any additional filtering/consolidation. It is just useful to use a PRS software that checks alleles, etc. You can also use Plink. Also I hope it would help to know that the PRS are available on the PGS catalog: https://www.pgscatalog.org/publication/PGP000531/ and we included a single PRS (https://www.pgscatalog.org/score/PGS004236) that accounts for the summation of three separate PRSs.

Could you please update this information on your main GitHub page too, it would be very helpful. I just happened to go through the "issues" and fond this.

Thank You.

tamartsi commented 6 months ago

Thanks @samreenzafer for this very appropriate suggestion -- done!