Closed orydatadudes closed 3 years ago
autoimpute
has several imputation methods available. You pass an imputation method as a strategy, and behind the scenes each imputation method maps to an Imputer
class that implements said method. For example, the PMMImputer
implements pmm
. You can pass imputation methods per column, which gives you the flexibility to impute values in each column as you please.
Now we also offer default methods for users who would like to move quickly or don't know which method to use. default time
is one of those, and it maps to the DefaultTimeSeriesImputer
https://github.com/kearnz/autoimpute/blob/d6eb7c34ceea93145905519d1e0b0767f1c6cfbd/autoimpute/imputations/series/default.py#L281
The DefaultTimeSeriesImputer
determines the data type of the feature for you and picks an imputation method - mode for categorical variables, and interpolation for numerical variables.
In your case, you are imputing per column. year
uses default time
, which ends up using interpolation
. value
uses pmm
, which you specified explicitly.
Note that the time series methods in autoimpute
are somewhat less developed than other the cross-sectional methods. Time Series imputation is an active area we'd like to work on more, as it could be improved within this package.
Let me know if this answers your questions! Will close this issue later if nothing else.
i read the documentation but some things still not clear enough for me: assuming the following input data value quantity temperature
0 1573 13597 51 1 3996 513 nan 2 nan 11589 50
imp = MiceImputer( n=3, strategy={"Year": "default time","value": "pmm"}, visit="left-to-right", return_list=True )
imp.fit_transform(df)
so two different algorithms will use? one for predict value and another one for temperature? thanks