Open stimpfli opened 3 years ago
@stimpfli
Thanks for the detailed write up! Your write up makes sense to me, and I'm familiar with that source. We actually originally had placeholder values in the earlier days of the package. Let me check my original code, but I'll leave this issue open and get it on the roadmap. Don't have timelines at the moment, but looking to tackle a number of issues in the coming weeks, so should be soon.
Regarding your second question, The MiceImputer
is a subclass of the MultipleImputer
, so it uses the same fit
and fit_transform
methods but overrides the transform
method itself in order to perform k
column updates. In performing k
column updates, it calls the underlying imputer's transform
method k
times. That transform
method ensures fit
has already been called, or it throws an error. Therefore, that transform
function call you reference will only work if the underlying imputer has already been fit.
Let me know if you have any other questions!
It seems to me that your MiceImputer only uses complete cases to train the SingleImputers but from what I red about MICE imputation it should not be the case:
As a consequence when there is a missing value in each row a
ValueError
is throw similar to the one described in #65 that is to my opinion due to none of the sample surviving the line: https://github.com/kearnz/autoimpute/blob/a214e7ad2c664cd6c57843934ebf159067d6261f/autoimpute/imputations/dataframe/single_imputer.py#L196 that returns an emptyx_
andy_
. This could be solved by selecting all samples with an observed y and filling all missing values inx_
with a mean/mode or random imputer instead of selecting only complete cases inlistwise_delete
.I do not think this issue is related to the number of columns used (asw #65) as I could replicate it by slightly modifying the titanic dataset. Here is the code illustrating the issue. I voluntary introduce missing values in each row but none of the rows neither column is completely missing thus imputation should be usable.
Finally if you could tell me more about what it doing this line: https://github.com/kearnz/autoimpute/blob/a214e7ad2c664cd6c57843934ebf159067d6261f/autoimpute/imputations/dataframe/mice_imputer.py#L136 since no
fit
is called before thetransform
.I hope there are enougth details for you to help me solve this issue. Thank you for the nice work on the module.
Victor