Closed nickrobinson251 closed 3 years ago
ah, i didn't realise it was a goal of the package to impute non-missing
values. If that's not documented, perhaps it'd be worth adding somewhere?
Feel free to close this :)
It's only really documented for the Context type and isn't currently used in any examples. I'll leave this open till that's added. If we move in the direction of having an Impute.Iterators
module then the behaviour of Impute.Iterators.drop
and skipmissing
should become almost identical in the base case. Might be a good thing to test against though :)
skipmissing wouldn't work if we change the missingness function
The more I think about this, the less I like.
Julia gives us missing
, which is a monumentally useful thing, and I think it is reasonable to build for that case. If users need to replace(X, 999999 => missing)
then they can do that before imputing
Agreed. That’s why I’d like to move the current behaviour to an iterators module and default to using a multipass approach with an Impute.Dataset type. I'll note that most of these design decisions were made when Missing and Nullable we’re both things, which is less relevant now that julia provides missing
by default.
A couple notes on how I think this should exist in the Impute.Iters API.
fill
, it probably shouldn't be applying a function over all of the non-missing data in the interator interface and should instead be using something like an OnlineStat
if a single pass is the desire behaviour. If you're willing to do multiple passes then just manually create an Impute.Dataset
type with a custom mask.That's exactly what's happing in the new Impute.substitute
call introduced in #69
https://github.com/invenia/Impute.jl/blob/master/src/imputors/substitute.jl#L51
skipmissing
wouldn't work if we change the missingness function (e.g.,isnan
,x -> x == 999999
). Many other datasets use different sentinel values.