kearnz / autoimpute

Python package for Imputation Methods
MIT License
241 stars 19 forks source link

how to use DefaultTimeSeriesImputer #58

Closed orydatadudes closed 3 years ago

orydatadudes commented 3 years ago

i read the documentation but some things still not clear enough for me: assuming the following input data value quantity temperature
0 1573 13597 51 1 3996 513 nan 2 nan 11589 50

  1. using this imputation assuming the data is sorted by any time index/column?
  2. how can i know which algorithm is used? for predict temperature value?
  3. let assume i use MiceImputer:

imp = MiceImputer( n=3, strategy={"Year": "default time","value": "pmm"}, visit="left-to-right", return_list=True )

imp.fit_transform(df)

so two different algorithms will use? one for predict value and another one for temperature? thanks

kearnz commented 3 years ago

autoimpute has several imputation methods available. You pass an imputation method as a strategy, and behind the scenes each imputation method maps to an Imputer class that implements said method. For example, the PMMImputer implements pmm. You can pass imputation methods per column, which gives you the flexibility to impute values in each column as you please.

Now we also offer default methods for users who would like to move quickly or don't know which method to use. default time is one of those, and it maps to the DefaultTimeSeriesImputer https://github.com/kearnz/autoimpute/blob/d6eb7c34ceea93145905519d1e0b0767f1c6cfbd/autoimpute/imputations/series/default.py#L281

The DefaultTimeSeriesImputer determines the data type of the feature for you and picks an imputation method - mode for categorical variables, and interpolation for numerical variables.

In your case, you are imputing per column. year uses default time, which ends up using interpolation. value uses pmm, which you specified explicitly.

Note that the time series methods in autoimpute are somewhat less developed than other the cross-sectional methods. Time Series imputation is an active area we'd like to work on more, as it could be improved within this package.

Let me know if this answers your questions! Will close this issue later if nothing else.