invenia / Impute.jl

Imputation methods for missing data in julia
https://invenia.github.io/Impute.jl/latest/
Other
76 stars 11 forks source link

`fill` with `value=f::Function` isn't comfortable with entirely `missing` data #51

Closed nickrobinson251 closed 3 years ago

nickrobinson251 commented 5 years ago

Here are some common functions we may want to use with fill but which (arguably) have awkward behaviour if all data is missing -- stemming from fill using drop and subsequently ending up calling the function on a empty array:

julia> using Impute, Statistics

julia> x = [missing 2.0; missing 4.0]
2×2 Array{Union{Missing, Float64},2}:
 missing  2.0
 missing  4.0

julia> Impute.fill(x, value=mean, dims=1)
2×2 Array{Union{Missing, Float64},2}:
 NaN  2.0
 NaN  4.0
julia> x = [missing 2; missing 4];

julia> Impute.fill(x, value=mean, dims=1)
ERROR: InexactError: Int64(NaN)
julia> x = [missing 2.0; missing 4.0];

julia> Impute.fill(x, value=median, dims=1)
ERROR: ArgumentError: median of an empty array is undefined, Union{Missing, Float64}[]
julia> x = [missing 2.0; missing 4.0];

julia> Impute.fill(x, value=middle, dims=1)
ERROR: ArgumentError: collection must be non-empty

(Impute v0.3, tested on Julia v1.2)

Apologies for not having a solution here... but some question we might want to think about

rofinn commented 5 years ago

I'd be okay with retaining the missing in this case. One thing I've been considering is that if we split out an Iterators interface then we'd likely want to use an OnlineStat instead. Unfortunately, this would just use that default value (e.g., value(Mean()) -> 0.0) which also might not be what we want? I guess in that case we could check nobs(value) before returning.

rofinn commented 3 years ago

Alright, #69 fixed this by having impute(data::AbstractArray{Missing}, imp) just return in the input data cause most imputation methods are going to need at least some non-missing values to work properly.