Closed Zethson closed 2 years ago
This is due to https://github.com/theislab/ehrapy/blob/development/ehrapy/preprocessing/_data_imputation.py#L337 (guess it was introduced in the last PR).
Should be: elif isinstance(var_names, (dict, type(None))):
instead.
EDIT: Going through the code, I guess it does not make any sense to pass a list as var_names
to miss_forest_impute
. Currently, they're just passed to the num_imputer
and this could lead to problems if some of the passed columns are non_numerical. That's the reason why I implemented it using a dict
instead. IMO this should be removed.
EDIT 2: And we should always impute first and encode after, since when encoding first, all missing values in (at least) non_numerical values are lost (at least in X
).
Describe the bug
Imputation not done.
To Reproduce
Steps to reproduce the behavior:
Crashes with message that the data contains NaNs.
Why is this not caught by the test?