invenia / Impute.jl

Imputation methods for missing data in julia
https://invenia.github.io/Impute.jl/latest/
Other
76 stars 11 forks source link

An attempt at rewriting simple imputation methods as iterators #60

Closed rofinn closed 4 years ago

rofinn commented 4 years ago

I'll be immediately closing this PR, but I figured I'd document why rewriting even the simple imputation methods as iterators didn't seem reasonable in the end.

  1. Many methods require looking forward or behind (e.g., locf, nocb, interpolation) which isn't guaranteed to work consistently for all iterators.
  2. Generalizing univariate iterators to multivariate datasets is challenging to do in a performant way (e.g., iterating at different rates for each variable may result in multiple copies of an observation).
  3. Preserving type information may be challenging when splitting and combine observations.
  4. In some cases we need to pass significant information around in each iteration state. Particularly, if we're nesting many iterators.

Overall, I think a better approach moving forward will be to provide a handful of methods/utilities on a small selection of types that can be applied out of the box at the cost of making multiple passes. We can also focus on making the Imputor API more extensible to help get folks up and running if they think they can impute all of their data in a single pass.