py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.77k stars 713 forks source link

SingleTreePolicyInterpreter (causal forest) Follow-Up #692

Open titubs opened 1 year ago

titubs commented 1 year ago

Hi Keith, @kbattocchi

I had a follow-up question on the SingleTreePolicyInterpreter (causal forest) Interpreter in general. My questions are:

  1. If I have 3 Y-variables (i.e. sales, retention, session duration), would be SingleTreePolicyInterpreter consider all 3 of them when building the policy tree and when calculating the[value - cost] in each leaf plus the overall "average policy gain" displayed at the top of the tree? or do I need to do this 3 times for each Y?
  2. For the parameter "sample_treatment_costs", I tried using a numpy array because for each user, the cost would be different (based on their LTV). Is that possible because when I tried it, it was expecting a double scalar. Is there any work-around or must this parameter be a constant?
  3. If my two Treatments are "discount" or "customer support", how do I know using the SingleTreePolicyInterpreter for which users"
- increase discounts for? (for insensitive users)
- decrease discounts for?(for sensitive users)
- increase customer support for? (for non-saavy users)
- decrease customer support for? (for saavy users)

Currently, my tree outputs only: "Discount" and "customer support" but not the direction (increase or decrease). How could I get to that? Do I need to work directly with the individual CATEs of each subject which drifts away from the SingleTreePolicyInterpreter approach?

kbattocchi commented 1 year ago

The interpreter will consider all 3 outcomes; however, the target is just the simple average of them, which may not be appropriate if they don't have commensurable scales. Thus, you might want to either rescale the outputs yourself or create separate interpreters for each outcome, depending on what you're actually trying to learn.

You should be able to pass individual treatment costs - if you have two continuous treatments then these should be an n-by-2 array of marginal costs of each treatment per individual. If this isn't working, please provide a small repro.

With multiple treatments, the policy interpreter is telling you which treatment (if any) has the biggest positive effect on the (average) outcome - the direction is always assumed to be positive.

fobembe commented 1 year ago

@kbattocchi, @titubs

Can you please enlighten me on how you generated sample_treated_cost in the singletreepolicyinterpreter parameter? I would be glad to have your feedback. I am working on a project that implements a 5% discount for the consumers, I am thinking if 0.05 is my sample_treatment_cost??