Closed zichengwang98 closed 2 years ago
Hi @daveipart,
Thanks for the bug report. I pushed a fix for this (69edf1f)
I'm afraid I don't have the bandwidth to implement support for plink2 native files. I'm happy to accept a PR if anyone would like to do that.
Re the mixture weights: zero simply means that you're better off not using the PRSs at all to predict the test set phenotypes (because the code uses non-negative least squares estimates). This can happen for two reasons:
You can easily test the second option by flipping the sign of the PRS (multiply by -1). If it doesn't solve the problem, then I would diagnose deeply how well each PRS can individually predict the phenotypes in the test set...
I hope it's clear, happy to elaborate more if not!
Hi all,
We are also seeing the same issue with both the mixing weights as 0. Our 2 input files are betas from polyfun-susie + betas from a GWAS. We have previously shown that PRS trained based on the betas from the GWAS explain ~11% of phenotypic variance in the target dataset using PRS-CS. Both beta files and the target file are on the same genome build, and have the A1 and A2 alleles the same way round. Are there any file formatting criteria that we could be missing here?
Is the polyfun_output.agg.txt.gz file available to share so we can try the toy example?
Thank you!
@niamham PolyPred uses non-negative least-squares by default, in order to not allow negative mixing weights. This helps ensure that the R^2 of the combined PRSs cannot be lower than the R^2 of the best PRS. However, this can lead to an estimate of 0 when the PRS obtain R^2=0 on the target phenotype.
To help diagnose why this happens in your case, I modified the code of polypred.py
to allow negative mixing weights. To test this, please git pull the latest version of the code, and then run polypred.py
and provide the new argument --allow-neg-mixweights
(all the other arguments should remain the same as before). This should hopefully provide non-zero estimates, though I suspect that these estimates will be close to zero.
If you still get unreasonable PRS, I suggest that you try running the plink command that polypred.py
runs directly on your input input files and see if you then get reasonable estimates. Please let me know if you'd like to me explain which command to run (you can see this command in the output of polypred.py
).
BTW I also found and fixed a bug related to PRS coefficients that are defined with respect to the opposite allele, but I doubt this is related to the zero estimates that you're seeing.
Hope this helps, please let me know if not!
Hi Omer,
Thanks! I tried the latest version of PolyPred with --allow-neg-mixweights
. I first tested to build PRS using betas generated by SBayesR and PolyFun, separately. Both PRS are positively correlated with the actual phenotype in the target cohort. When I run PolyPred to combine the betas in a similar training population, the intercept is positive and both weights are negative(log attached). However, the PolyPred PRS is still positively correlated with the phenotype in the target cohort, even though the weights are negative. I expect the correlation to be flipped. Is there an explanation of why this might happen?
Best, Zicheng
mix_weight pheno_SBayesR_LD_1M.snpRes -20.874761716399405 pheno_finemapping.agg.txt.gz -2.4749174701116337 intercept 43.49627611093517
@zichengwang98 PolyPred automatically flips the sign of all PRS coefficient, if they're negatively correlated with the target phenotype (this usually happens because the effect allele can be flipped across different methods). So this explains the results that you're seeing.
I'm still not 100% sure why you're getting zero coefficients when not providing the flag --allow-neg-mixweights
. The flipping of the sign should guarantee that a negatively correlated PRS would still be used by PolyPred...
I'm happy to take a look myself if you can send me a reproducible example. I realize you probably can't share the entire data, but perhaps it's fine to share just the (anonymized) phenotypes and PRS coefficients? If not, maybe you can create a synthetic reproducible example and send to me?
Hi Omer,
If the signs of the PRS coefficients are automatically flipped, then the weights should be always positive, with or without --allow-neg-mixweights flag right? I guess the problem occurred since the sign is not automatically flipped.
I double-checked the betas of PolyFun, SBayesR, and PolyPred(the effect alleles and betas are properly aligned). It turns out that they are also positively correlated with each other. They all worked well in the target population.
Also, what's your latest email address? Previously I tried to reach out to you using your HSPH email but it failed.
Zicheng
@zichengwang98 there are several things going on here:
The way I tested whether to flip betas for a specific PRS was to test the univariate correlation between that PRS and the phenotype. This may be misleading if the PRS has mean different from zero. So it might be the case that a beta is negative because of that. To address this case, I now modified the test to also estimate an intercept (essentially performing a univariate linear regression). Can you please to git pull the latest version and rerun?
Another possible reason for negative betas could be because the two PRSs are strongly correlated with each other, which could lead to numerical issues (but this doesn't seem to be a problem in your case)
Sorry for these many tweaks --- I admit this part of PolyPred isn't as robust as I would have liked, because I never ran into these issues when I worked on the data --- everything was just calibrated perfectly :)
I hope my latest fix addresses your issues. If it doesn't --- please feel free to send me something to look at to omer.we@gmail.com
Hi Omer,
I tested with the latest version of PolyPred and now I can get positive weights. Thanks!
Best, Zicheng
I'm trying to combine the betas using PolyPred but all the weights and betas are zero.
The summary statistics of the phenotype are based on UKB data, thus I directly used the provided LD matrices for PolyFun. The betas from SBayesR(377K SNPs) and PolyFun(9+ million SNPs) look normal and are not zeros. Last, I combined the betas using PolyPred with UKB African population, but the weights and betas are all zeros. I tested PolyPred with both plink and plink2(with genotyped and imputed data) and could not solve this issue. There's no errors/warnings when I run the script with plink.
Here's the plink2 script I have used as an example. For plink 1.9 I changed to .bed files and --plink-exe argument, and the results are the same.
python ../bin/polyfun/polypred.py \ --combine-betas \ --betas pheno_ukb_SBayesR_LD_1M.snpRes,pheno_finemapping.agg.txt.gz \ --pheno ../data/African_pheno.txt \ --output-prefix output/pheno_polypred \ --threads 2 \ --memory 24 \ --plink2-exe ../bin/plink2 \ ukb_imp_chr*.pgen
mixweights:
pheno_ukb_SBayesR_LD_1M.snpRes 0.0 pheno_finemapping.agg.txt.gz 0.0 intercept 29.490740257670343
Also, when I run PolyPred with plink2, some errors were returned by the polypred.py script that I would like to check with you:
1) Line 373:
returned an error for not finding the path of plink2, even if the plink2 executable already exists.
I changed the highlighted variable to args.plink2_exe get rid of this error.
2) When using plink2 with plink2 files(.pgen,.pvar,.psam) without .bim and .fam, line 70-73 failed and returned an error. Do you have an option for users who wish to use pfiles instead of bpfiles?
3) Line 102 also caused an error in plink2, since it specifies “--bpfile”. It might be nice if users could use plink2 with pfiles without converting them to .bim and .fam.
After I made changes to those lines, I could run the script with plink2 smoothly without an error.
Please feel free to let me know if you have any questions. I appreciate your help!