Fix RegressionEstimator categorical one-hot encoding consistency bug.

Fix RegressionEstimator categorical one-hot encoding consistency bug by changing from pandas get_dummies() to sklearn OneHotEncoder.

Encoder objects are created during RegressionEstimator.fit() and persist until the next fit(), allowing them to be re-applied to encode new data either via additional calls to CausalModel.estimateEffect(..., fit_estimator=False, ...) or via do() operator.

In the earlier implementation, common cause, effect modifier and potentially treatment values could be inconsistently encoded between fit() and later inference - it depends on the order particular values are encountered in the new data.

To fix, a util function is created which patches sklearn OneHotEncoder to behave like pandas get_dummies, with a convenience member function of RegressionEstimator called _encode() that makes each use a one-line change.

It is also now possible to change drop_first from True (current and original default) to False, to allow inspection of all regression coefficients if desiring to interpret model behaviour.

py-why / dowhy

Fix RegressionEstimator categorical one-hot encoding consistency bug. #1109