omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
89 stars 22 forks source link

Handling missing SNPs (polyfun.py) #9

Closed bschilder closed 5 years ago

bschilder commented 5 years ago

polyfun.py

When df_snpvar contains SNPs that aren't in df_sumstats, the script throws an error and doesn't continue. I added several lines to simply remove the extra snps from df_snpvar and continue. lines 634-639

 if df_snpvar.shape[0] < df_sumstats.shape[0]:
            # raise ValueError('not all SNPs in the sumstats file are also in the annotations file')
            # BMS edit:: Remove extra SNPs instead of stopping.
            logging.info('Not all SNPs in the sumstats file are also in the annotations file.')
            snp_filt = df_snpvar.SNP.isin(df_sumstats.SNP)
            logging.info('Removing %d extra SNPs...' % (sum(snp_filt)))
            df_snpvar = df_snpvar.loc[snp_filt, :]

In my case, this edit resulted in the removal of 34872 SNPs (using the sample annotations downloaded with the repo and Nalls et al GWAS summary stats).

omerwe commented 5 years ago

Thanks! I added a --allow-missing flag to polyfun.py which does the same thing when invoked.