omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

direction of beta remain the same despite allele switch #134

Closed bnj50 closed 1 year ago

bnj50 commented 1 year ago

Hi I have my sumstat (named =myfile.txt) with this header:

CHR SNP BP A1 A2 BETA SE p_value 1 rs3094315 752566 A G -.00830888 .02015612 .6801735

after running this script: bash-4.2$ python /usr/local/polyfun/1.0.0/extract_snpvar.py --sumstats myfile.txt --out ./myfile-out.txt --allow-missing

the output, named as myfile-out.txt is look like this:

CHR BP SNP A1 A2 BETA SE p_value SNPVAR 1 752566 rs3094315 G A -8.3089e-03 2.0156e-02 6.8017e-01 6.2623e-08

but as you can see the alleles has been switched but the direction of original beta is the same as before, so I can not use this beta(s) for downstream analyses... in other words, the direction of A1 only is for SNPVAR ..am I correct?

thanks

omerwe commented 1 year ago

Hi, the problem is that extract_snpvar.py requires files that were preprocessed by munge_polyfun_sumstats.py (with a Z-score column). I now modified the code of extract_snpvar.py so that it explicitly fails if it doesn't find a column called "Z". If you git pull and then run again, you should get an error message...

Can you please first process the sumstats using munge_polyfun_stats.py, and then run extract_snpvar.py on the output file?

bnj50 commented 1 year ago

Hi many sumstats, only have beta (se) , odds ratio and P values but not z score. can you modify the code to accept Beta and it is not easy to re-generate Z score, when you have million of markers

omerwe commented 1 year ago

Did you try using munge_polyfun_sumstats.py?

bnj50 commented 1 year ago

yes, i did..but i can not read munge format. does the program accept Beta instead of z score ? In that case I just need to change the header name from Beta to Z?

thanks

omerwe commented 1 year ago

What do you mean by "i can not read munge format"? Why do you need to read it? Do you get any error message?

bnj50 commented 1 year ago

Hi I meant i can not open the munge file with textpad or note to read what exactly the alleles are ...also please comment on using beta column instead of Z score... thanks

omerwe commented 1 year ago

You don't need to look at the file, but in case you want to look at it for some reason you can run (from python):

import pandas as pd
df = pd.read_parquet(<sumstats_file>)

Then you'll have a dataframe with the file.

In any case, I'm not sure why you don't want to use the output of munge_polyfun_sumstats.py as is... You can just go ahead and use it in PolyFun...