py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.75k stars 711 forks source link

Causal Analysis Issue #754

Closed Leo-T-Zang closed 1 year ago

Leo-T-Zang commented 1 year ago

Hi,

When using Causal Analysis Function, I encounter following warnings.

Function delayed is deprecated; The function `delayed` has been moved from `sklearn.utils.fixes` to `sklearn.utils.parallel`. This import path will be removed in 1.5.
`sklearn.utils.parallel.delayed` should be used with `sklearn.utils.parallel.Parallel` to make it possible to propagate the scikit-learn configuration of the current thread to the joblib workers.
`sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
Function delayed is deprecated; The function `delayed` has been moved from `sklearn.utils.fixes` to `sklearn.utils.parallel`. This import path will be removed in 1.5.
`sklearn.utils.parallel.delayed` should be used with `sklearn.utils.parallel.Parallel` to make it possible to propagate the scikit-learn configuration of the current thread to the joblib workers.
`sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value.
Trying to unpickle estimator OneHotEncoder from version 1.1.3 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
Leo-T-Zang commented 1 year ago

Follow-up questions:

  1. What is the model or algorithm behind Cuasal Analysis Function to compute causal effect?
  2. When the feature is numerical, how to compute causal effect? Specifically, my understanding to compute causal effect is based on categorical features. For example, when you flip 0 to 1 or 1 to 0, you manage to predict counterfactual situation. So what is counterfactual of continuous features and how to compute
    it?
kbattocchi commented 1 year ago

For your initial question, it would be helpful if you could include the output of pip list as well as a cut down repro of the issue. I think it's probably safe to ignore those warnings for now as they mostly relate to how the internal structure of scikit-learn will change in future updates.

For your followups:

  1. The CausalAnalysis class uses our DML models to compute the causal effect (either LinearDML or CausalForestDML, depending on the setting of heterogeneity_model).
  2. Our models assume a linear effect, so the effect on the output of moving the treatment from 1 to 5 (a total increase of 4 units in the treatment) will be twice the effect of moving it from 0 to 2 (a total increase of 2 units). Our class computes the (constant) slope of that relationship, given the other features.
Leo-T-Zang commented 1 year ago

Thank you for your explanation. It helps a lot!

I have successfully run part of example codes. However, I found another issue when using heterogeneity tree

ca.plot_heterogeneity_tree(
    x_test,
    "age_m",
    max_depth=2,
    min_impurity_decrease=1e-6,
    min_samples_leaf = 5
)

The error is

[/usr/local/lib/python3.9/dist-packages/sklearn/utils/validation.py] in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    869             # If input is scalar raise error
    870             if array.ndim == 0:
--> 871                 raise ValueError(
    872                     "Expected 2D array, got scalar array instead:\narray={}.\n"
    873                     "Reshape your data either using array.reshape(-1, 1) if "

ValueError: Expected 2D array, got scalar array instead:

array=nan.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Do you have any idea of how to fix this problem? Thanks in advance.

kbattocchi commented 1 year ago

Sorry for the slow response. Could you provide a full stack trace (and ideally a self-contained repro)? It's possible we have a bug here but it's hard to know without understanding more of your context.

Leo-T-Zang commented 1 year ago

Problem is sovled. Thank you.