kearnz / autoimpute

Python package for Imputation Methods
MIT License
241 stars 19 forks source link

mice example with mixed data types #63

Closed chanshing closed 3 years ago

chanshing commented 3 years ago

Does the MICE imputer handle mixed categorical & continuous types automatically? I can't make it work and I can't find an example in the tutorials.

kearnz commented 3 years ago

Hi @chanshing - MICE (and autoimpute) generally work with categorical variables, although the support for numerical variables is much more mature. What exactly are you trying to do? impute a categorical variable? Pass it as a predictor for other columns? If you can post traceback or error logs or some more detail.

kearnz commented 3 years ago

closing this as discussion ended - feel free to reopen if you still have further questions!

kauttoj commented 2 years ago

Hi. I have the same question. Could you take a look at my simple example and tell me where this goes wrong. What is the proper way of using MICE for mixed data?

import numpy as np
import pandas as pd

# create some dummy data
data = pd.DataFrame({'A':np.random.rand(20), 'B': np.random.choice(['#','%','¤'],size=20), 'C': np.random.rand(20)})
data['A'] = data['A'].astype(float)
data['C'] = data['C'].astype(float)
data['B'] = data['B'].astype('category')

# add some missing elements
data.iloc[5,0] = np.nan
data.iloc[10,0]=np.nan
data.iloc[10,1]=np.nan
data.iloc[11,1]=np.nan
data.iloc[12,2]=np.nan
data.iloc[15,2] =np.nan

# try with defaults
mice = MiceImputer(return_list=True)
new_data = mice.fit_transform(data) # FAILED

# try with some specified strategy
strategy = {'A': 'default predictive','C': 'default predictive','B':'multinomial logistic'}
mice = MiceImputer(strategy=strategy,return_list=True)
new_data = mice.fit_transform(data) # FAILED

Both fail with "ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype='numeric'"