Closed nnagururu closed 6 months ago
Hi, I'm facing the same problem. I think the error occurs because the transform method looks for dataframe row indices in the training set that have missing value for a certain feature. Then when applied on the test set dataframe, the method does not find these indexes in there, since they belong to the training set dataframe.
I figured out what is happing, at least for my case. Whenever the transform method is called, the _get_missingrows(x) method is also called. The latter populates the __missingrow dictionary with a series of dictionaries in which: key -> feature: values -> list of indexes corresponding to the input df for which the feature has missing value.
When calling transform to impute a test or validation set, after applied _fittransform on a training set, the __missingrow dictionary is updated and not overriden.
I think the solution may be simply to insert:
self._missing_row = {}
at the beginning of the _get_missingrows method's definition.
I tried it and it seems to work.
@Sep905 you are right absolutely correct, thank you for figuring that out! I'll let you open the pull request for that change, and receive the glory.
But in the meantime, I realized that we can use python's unenforced access to private attributes to have a short-term workaround for this issue:
MissForest()
imputer.fit(X_train)
train_imputed = imputer.transform(X_train)
# reset the _missing_row attribute of the imputer object
imputer._missing_row = {}
test_imputed = imputer.transform(X_test)
worked for me without having to modify the library code.
Closes #39
Hi i'm getting the following error and have been unable to debug.
Thanks in advance!