Sum constraints for one-hot-encoded data

christophM commented 4 years ago

Is it possible with this implementation to impose a sum constraints on one-hot encoded feature columns, so that only one of the colums can be set to 1?

e.g. if you have features x1 x2 x3 = (0,0,1) then (1,0,1) should not be a returned as a counterfactual.

I didn't find it in the code, but it's mentioned in the paper: "Customizing the Action Space.Users can easily customize the set of feasible actions by addinglogical constraints to the IP. These constraints can be used when, for example, a classifier uses dummyvariables to encode a categorical attribute (i.e., a one-hot encoding)." Did I overlook it?

Thanks!

ustunb commented 4 years ago

Hi!

It's not possible yet, but it's on our to-do list.

The functionality itself is easy to implement, but we're dragging our feet since we haven't agreed on the right API function for users to specify constraints as of it.

One thing that could help us here is a sample script showing us you would have wanted to specify the variables that are subject to the "one-hot encoding" constraint.

Alternatively, if there is a standard way to work with one-hot encodings in scikit-learn, we'd appreciate knowing that as well.

Cheers, Berk

On Thu, Jan 9, 2020 at 8:00 AM Christoph Molnar notifications@github.com<mailto:notifications@github.com> wrote:

Is it possible with this implementation to impose a sum constraints on one-hot encoded feature columns, so that only one of the colums can be set to 1?

e.g. if you have features x1 x2 x3 = (0,0,1) then (1,0,1) should not be a returned as a counterfactual.

I didn't find it in the code, but it's mentioned in the paper: "Customizing the Action Space.Users can easily customize the set of feasible actions by addinglogical constraints to the IP. These constraints can be used when, for example, a classifier uses dummyvariables to encode a categorical attribute (i.e., a one-hot encoding)." Did I overlook it?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ustunb_actionable-2Drecourse_issues_14-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DABM4UCJDNRSFTAIHDBEU7HTQ44NVHA5CNFSM4KEX3XI2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IFBUDKQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=eB_oRo404IL7R_uIoh5wjffvGNzbFPBTaB7jG1xZ3Ck&m=xGPVh25dPh5mprfd9nr-p_lcloWx2LLPkgEvAUwEA04&s=tL7rBgvvr8Pig7eUOhnY23P0wnDN0fAYMyOokngTy0Q&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABM4UCIW4YK43VYEUM46XYTQ44NVHANCNFSM4KEX3XIQ&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=eB_oRo404IL7R_uIoh5wjffvGNzbFPBTaB7jG1xZ3Ck&m=xGPVh25dPh5mprfd9nr-p_lcloWx2LLPkgEvAUwEA04&s=6v138lcYgc-qV6EeN3ADb9UoMJaaorHQHdUSan_VA58&e=.

-- Berk Ustun, PhD Postdoctoral Fellow, Center for Research on Computation for Society Harvard John A. Paulson School of Engineering and Applied Sciences https://www.berkustun.com http://www.berkustun.com

christophM commented 4 years ago

Assuming columns 2,3,4 belong to a one-hot encoded feature (where the sum over these three columns is always 1 per row). And 8,9,10,11 to another one-hot encoded feature. Then maybe something like this:

A = action_set.ActionSet(X, onehot = [[2,3,4],[8,9,10,11]])

prateeky2806 commented 3 years ago

Hi, just wanted to check if this issue is solved I also want to run this model for categorical data with multiple categories which can be easily One-hot-encoded. It would be really helpful to know if this can be done or not.

ustunb / actionable-recourse

Sum constraints for one-hot-encoded data #14