Odds-Ratio - Negative values

nsankar commented 3 years ago

Hi, Hope you are doing great. When I used GPS classifier on a sensor anomaly data where X has the covariates (continuous numeric variables) and T is a continuous treatment variable (specific sensor data that probably caused the anomaly) and Y (outcome) is the anomaly labels(binary outcome , 1 anomaly and 0 normal) and When I used gps.estimate_log_odds prediction API

I get negative values (odds-ratio) for some of the Treatment variables. Below is one example. (image #1)

I believe negative values are incorrect? Am I missing something?

Also, How should I interpret odds-ratio values that has an arbitratry min/max range for a range of Treatment variables predicted using gps.estimate_log_odds ? (Pls. see image #2 below as an example)

Thank you in advance.

ronikobrosly commented 3 years ago

Hi @nsankar ! You’re absolutely correct, there should never be negative odds ratios. Hmmm. Would you be willing to send me a sample of your data so I can do some debugging?

nsankar commented 3 years ago

@ronikobrosly how can I reach you on the email

ronikobrosly commented 3 years ago

@nsankar My email is roni.kobrosly@gmail.com

ronikobrosly commented 3 years ago

Hi @nsankar , I think I might understand the issue. Did you use gps.estimate_log_odds to generate the image #1? If so, what you generated was an array of log-odds, which can possibly range from -∞ to ∞. So the negative results you observed would be possible fine.

If you're looking for to generate odds ratios using the lowest treatment value as a reference (the preferred way to use this GPS_Classifier), you should use the calculate_CDRC method.

So your workflow would look something like this:

gps = GPS_Classifier()
gps.fit(T = df['t'], X = df['x'], y = df['y'])
gps_results = gps.calculate_CDRC(0.95)

Where gps_results will contain a column of the odds ratios. As mentioned here, the odds ratios generated with this function give you a sense of the relative odds of a treatment value causing the highest outcome class to occur relative to the lowest treatment value. So if you want to see the causal effect of a treatment value of 20.0 and the lowest treatment value happens to be 10.0, the odds ratio at 20.0 will represent:

odds of higher outcome class occuring at treatment = 20.0 / odds of higher outcome class occuring at treatment = 10.0

If the odds ratio here is 1.0, that tells you a treatment value of 20.0 does nothing different over the effect of a treatment value of 10.0. If the odds ratio is 5, then the treatment value of 20.0 had 5 times the effect of a treatment value of 10.0. So it provides relative treatment effects, relative to the lowest treatment value. These odds ratios should always be bound between 0 to ∞. They will never be negative.

Now, the gps.estimate_log_odds produces something different. It is not relative to any treatment value. It simply gives you the log odds of the higher outcome class occurring at a provided treatment value. Again, these values can possibly range from -∞ to ∞ and are more difficult to interpret.

Does this help? Or did I miss the point?

nsankar commented 3 years ago

@ronikobrosly Noted. Yes.I had used the gps.estimate_log_odds method to predict and to plot the image. I get your point. I will go through gps.calculate_CDRC function and try . This really helps. Thanks for the insights.

ronikobrosly commented 3 years ago

Great! Feel free to close the issue if that’s it, or let me know if you have any other questions.

ronikobrosly / causal-curve

Odds-Ratio - Negative values #41