omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
86 stars 21 forks source link

Detect edge case of OR==0 #44

Closed jdblischak closed 3 years ago

jdblischak commented 3 years ago

Thanks for creating and maintaining this really useful software. This PR adds support for the edge case where the odds ratio column contains a zero. In this rare case, the odds ratios are not log transformed, and thus the signs of the Z scores will always be positive since the odds ratio is always positive. The updated function instead throws an error to alert the user to the presence of the zero value(s).

Here is example code to demonstrate how it behaves:

import numpy as np
import pandas as pd
import logging

def convert_odds_ratio_to_log(df_sumstats):
    if 'OR' not in df_sumstats.columns:
        return df_sumstats

    # If there are negative values, assume it already contains log-odds
    if np.any(df_sumstats['OR']<0):
        return df_sumstats

    # If they are all greater than zero, log transform
    if np.all(df_sumstats['OR']>0):
        df_sumstats['OR'] = np.log(df_sumstats['OR'])
        logging.info('Converting OR column to log-odds')
        return df_sumstats

    # Edge case: No negative values, but contains zero(s)
    if np.any(df_sumstats['OR']==0):
        raise ValueError('The input file includes SNPs with an odds ratio (OR) of 0. Please remove these variant(s).')

# No OR -> no change
df = pd.DataFrame({'SNP': ['a', 'b', 'c'], 'BETA': [0.25, -1, 1.5]})
convert_odds_ratio_to_log(df)

# Negative OR -> no change
df = pd.DataFrame({'SNP': ['a', 'b', 'c'], 'OR': [0.25, -1, 1.5]})
convert_odds_ratio_to_log(df)

# Positive OR -> log transform
df = pd.DataFrame({'SNP': ['a', 'b', 'c'], 'OR': [0.25, 1, 1.5]})
convert_odds_ratio_to_log(df)

# OR==0 -> error
df = pd.DataFrame({'SNP': ['a', 'b', 'c'], 'OR': [0.25, 0, 1.5]})
convert_odds_ratio_to_log(df)

Related Issue: #21

omerwe commented 3 years ago

Good catch, thank you!