Closed andrewmagis closed 8 months ago
@nkurniansyah can we fix that? this is an oversight on our part... we need to add the mock p-value column to use PRSice.
Thanks - one other question about the Genotype input. It seems to require Plink files, but none are provided in the repository. Can these be provided as well?
I am trying to reproduce your PGS based on the instructions in this repository, but it would be easier to just get access to the one you created for the paper, if that can be made available in the PGS Catalog or via direct communication. (https://www.pgscatalog.org/).
Hi There,
Thank you for using our HTN-PRS. As we have already completed the clumping and thresholding steps, you only need to use a weight for the analysis. I have included a mock p-value in our summary statistics to facilitate this. With this addition, you should be able to generate PRS without facing any errors.
We are currently in the process of submitting this file to the PGS catalog and are awaiting their response.
Regarding the genotype data, we cannot provide it, as it comes from TOPMed and MGB biobank, which include confidential patient IDs that cannot be shared.
Please let me know if you encounter any errors.
Thanks, that is good to know that the PGS is being submitted to PGS Catalog.
When I run the command using the updated files, I get an error that Genotype.fam is missing. Do I need to obtain the raw genotype files from TOPMed and MGB before I can build the PGS?
./PRSice_mac \ --a1 Allele1 \ --a2 Allele2 \ --all-score \ --bar-levels 1 \ --base ./Summary_Statistics_for_PRS_construction/2023-12-04_HTN_PAN-UKBB.txt \ --beta \ --binary-target F \ --bp Position \ --chr Chromosome \ --chr-id c:l:a:b \ --fastscore \ --ignore-fid \ --model add \ --no-clump \ --no-full \ --no-regress \ --num-auto 22 \ --out ./out_prs \ --print-snp \ --pvalue PValue \ --seed 1335787 \ --stat BETA \ --target ./Genotype \ --thread 2
Initializing Genotype file: ./Genotype (bed)
Start processing 2023-12-04_HTN_PAN-UKBB ==================================================
Base file: ./Summary_Statistics_for_PRS_construction/2023-12-04_HTN_PAN-UKBB.txt Header of file is: Chromosome snpID Position Allele1 Allele2 BETA PValue
Reading 100.00% 234228 variant(s) observed in base file, with: 234228 total variant(s) included from base file
Loading Genotype info from target ==================================================
Error: Cannot open file: ./Genotype.fam
Error: Execution halted
You need to provide genotype data to calculate PRS.
If you want to use resources like TOPMed and MGB, you need to follow the specific application process; we provide study-specific accession in the paper under data availability.
If you don't have access to any genotype data, an alternative is to utilize freely available genotype data, such as that from the 1000 Genomes Project. However, it's important to note that the sample sizes in these free datasets are smaller.
Oh I understand - so the PRSice command is not doing some additional filtering/consolidation of the summary statistics to produce the final PRS - it is computing the PRS using the genotype data? Is command to use PRSice to output the final, consolidated summary statistics so I can compute the PRS myself?
Hi @andrewmagis, it seems like your question wasn't resolved. It is correct that the PRSice command does not do any additional filtering/consolidation. It is just useful to use a PRS software that checks alleles, etc. You can also use Plink. Also I hope it would help to know that the PRS are available on the PGS catalog: https://www.pgscatalog.org/publication/PGP000531/ and we included a single PRS (https://www.pgscatalog.org/score/PGS004236) that accounts for the summation of three separate PRSs.
Thanks for making me aware of your deposition in the PGS catalog. I'll take a look now. Appreciate it!
Hi @andrewmagis, it seems like your question wasn't resolved. It is correct that the PRSice command does not do any additional filtering/consolidation. It is just useful to use a PRS software that checks alleles, etc. You can also use Plink. Also I hope it would help to know that the PRS are available on the PGS catalog: https://www.pgscatalog.org/publication/PGP000531/ and we included a single PRS (https://www.pgscatalog.org/score/PGS004236) that accounts for the summation of three separate PRSs.
Could you please update this information on your main GitHub page too, it would be very helpful. I just happened to go through the "issues" and fond this.
Thank You.
Thanks @samreenzafer for this very appropriate suggestion -- done!
Hi - thanks for making these data and code available. I have installed PRSice and cloned the repository, Using the code snippet that you posted here, I tried to run PRSice. Howver, it seems the summary statistics files do not have a PValue column, and this is required by PRSice. Here is my command: