Closed mkoromina closed 2 years ago
Hi @mkoromina, looks like the results are missing a MAF column. Did you run the fine-mapping results on a file that was processed by munge_sumstats.py
?
The simplest direct solution is to omit the --adjust-beta-freq
flag from the invocation of aggregate_finemapper_results.py
, but this would mean that the aggregated results would represent per-normalized-allele effects instead of per-allele effects. If you want per-allele effects, you can take the aggregated results file and divide BETA_MEAN and BETA_SD of each SNP by sqrt(2*f*(1-f)
, where f
is the MAF of that SNP...
If you have the computational capacity, maybe it's better to rerun the munge_sumstats.py
script and make sure that it gets a MAF column as input. But then you'll have to rerun all your fine-mapping jobs...
Sorry there's no simpler solution, I hope you can find an easy way to solve this! Please let me know if you'd like me to explain more!
Hi @omerwe,
Many thanks for this! I used the munge_sumstats.py
script to munge my sumstats, however, I used the ridge_constrained files to run the genome-wide fine-mapping jobs. Could it be because of that?
Moreover, I ommited the--adjust-beta-freq
flag and although the results' aggregation ran with no errors, this led to all betas equaling with 0 after running step 3 (i.e., linearly combining the effect sizes).
Do you know why could this be? Could it be an issue with the headers of the files (i.e variable names)? Many thanks once again!
@mkoromina your approach seems perfectly fine. The question is why the MAF column is missing. I guess the munge_sumstats.py
script couldn't find it? Did you run it with --min-maf 0
? Otherwise it should have complained...
all betas=0 likely mean that the phenotype in the small training set from the target population is very very different from the phenotype in the training sets used for running the other methods...
How accurately can you predict if you take the original betas (from the other method and/or from PolyFun) and try to predict the phenotypes of the small training set of the target population? If I'm right, you shouldn't be able to predict them at all... If you can't, then that's the source of the problem. If not, then we'll need to dig deeper :)
Hi @omerwe,
Many thanks once again!
Truth is that there is not a column named as 'MAF', since the sumstats file follows the standard ricopili format and the names of some of the columns differ. I checked the .log
file and there is a warning about the MAF so I guess this answers why there is no MAF column in the output.
In terms of all betas=0 (hence all PRS=0), I think what you suggested makes perfect sense. Is there though another case example that I could use and apply the pipeline?
Thanking you very much in advance!
@mkoromina you can check the example provided with PolyFun (as explained in the wiki page). I'm not sure if that's what you meant?
Hi @omerwe , I will send you an email (if that's okay) to better outline that! Thanks a lot anyways!
HI @omerwe,
I have been trying to merge the results from my genome-wide fine-mapping results but I stumbled upon the following issue. I ran:
However, this is the full error message that I get:
Do you know what could be potential wrong? Many thanks once again!