kearnz / autoimpute

Python package for Imputation Methods
MIT License
237 stars 19 forks source link

Implemented SeriesImputer #46

Closed gjdv closed 4 years ago

gjdv commented 4 years ago

Hi,

I have implemented the SeriesImputer to solve issue 43. It turned out to be easiest to make it a subclass of MultipleImputer. I needed to make minor modifications to SingleImputer, which also solve issue 44. The added arguments default properly for use beyond SeriesImputer. Here is some code you can use for testing:

import numpy as np
import pandas as pd
from autoimpute.imputations import MultipleImputer, SeriesImputer

n = 50
dat = pd.DataFrame({'A': np.random.uniform(0, 1, n), 'B': np.random.uniform(0, 1, n), 'C': np.random.uniform(0, 1, n)})
dat['B'][dat['B'] < 0.25] = np.nan
dat['C'][dat['C'] < 0.7] = np.nan

# imp = MultipleImputer(strategy='stochastic', return_list=True)
# dat_imp = imp.fit_transform(dat)

imp = SeriesImputer(strategy='stochastic', return_list=True)
dat_imp = imp.fit_transform(dat)

Kind regards,

Gert-Jan

kearnz commented 4 years ago

Hi - thanks for working on this, appreciate your effort. I went through it, and it looks very good. All the tests passed, so it looks like changes to the SingleImputer are good to go. Excited that we'll get this in there.

I left a couple comments - let me know your thoughts. Once we resolve those, we'll merge the pull request.

gjdv commented 4 years ago

Hi, Thanks for your reaction. The name SeriesImputer I took from your issue-description, but I like MiceImputer better. Adding some reference to MICE in the class description might also be a good idea. I also agree with your other suggestions. I noticed myself that I forgot to add to the comments/documentation the added argument imp_ixs to SingleImputer.fit and .transform. I'm not too familiar, process-wise, with following up on such comments in a pull request. How do we go from here? Do you make these changes, or should I do that in my fork?

kearnz commented 4 years ago

Great - let's go with MiceImputer. For documentation, add any notes you think are necessary to the actual classes themselves. You can follow the format that I've used for methods in other classes. The format is important because the docs are autogenerated, but feel free to write any content within the format. Once everything is merged, I'll regenerate the documentation, and I'll make amendments to the README, bump the version, etc.

Regarding process, normally I'd suggest using git commit --amend, as these are pretty straightforward changes to a non-merged PR. That being said, I'm going to re-create the dev branch, so I'm going to close this PR. What you can do is make the changes we discussed here, then issue a new PR to the fresh dev branch, and we'll be good to go.

kearnz commented 4 years ago

New dev branch pushed. Did this to clean some stuff up locally, as I was dealing with some conflicts. Anyway, should be good to to issue a new pull request with changes, and we'll integrate from there. I'm sure there's a better way I could do this process wise too...