Closed khanwa closed 2 years ago
Hi, would you like to try this script instead ?
import numpy as np import pandas as pd from MissForest import MissForest from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import RandomForestRegressor
# Read our toy dataset
fish = pd.read_csv('Fish.csv')
# Set missing values
fish.iloc[1, 0] = np.nan
fish.iloc[155, 0] = np.nan
fish.iloc[1, 2] = np.nan
fish.iloc[155, 2] = np.nan
# Instantiate our imputator
mf = MissForest()
fish = mf.impute(x=fish, classifier=RandomForestClassifier(), regressor=RandomForestRegressor())
print(fish)
It seems like you are setting mfe to mfe.impute(data, rfc, rfr) and the order of classifier and regressor argument is wrong.
mfe= mfe.impute(data, rfc, rfr)
Actually, it is the same. https://colab.research.google.com/drive/1olzHObF3eSYk5fYf0-3tsJBUlGD_VuGx?usp=sharing
Could you send me your data ? Thank you.
Thank you so much. here it is.
I fixed the bug and tried with the data you provided. If works fine so far.
from missforest.miss_forest import MissForest
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
# Read our toy dataset
data_train=pd.read_csv('cancer_train_1.csv')#
train_label=data_train.iloc[:,-1:]
data_train.drop('class', axis=1, inplace=True)
data_testt=pd.read_csv('cancer_test_10_1.csv') #
testt_label=data_testt.iloc[:,-1:]
data_testt.drop('class', axis=1, inplace=True)#
label_all = pd.concat([train_label, testt_label], ignore_index=True)
data=pd.concat([data_train,data_testt], ignore_index=True)
print(data.isnull().sum())
# Instantiate our imputator
mf = MissForest()
data = mf.fit_transform(X=data)
print(data.isnull().sum())
a 28 b 17 c 32 d 32 e 31 f 43 g 35 h 30 i 22 dtype: int64 a 0 b 0 c 0 d 0 e 0 f 0 g 0 h 0 i 0 dtype: int64
Thank you very much.
Thank for sharing with us the implementation. I am having an error
ValueError: at least one array or dtype is required
when I runmfe= mfe.impute(data, rfc, rfr)
. It is working fine with I read fish = pd.read_csv('Fish.csv')But When I read some other file it gives the error. Although my DF is fine "[699 rows x 10 columns]", Type "Dataframe". Could please check?