invenia / Impute.jl

Imputation methods for missing data in julia
https://invenia.github.io/Impute.jl/latest/
Other
76 stars 11 forks source link

Add LimitedLOCF #116

Closed morris25 closed 3 years ago

morris25 commented 3 years ago

Adds a limited version of LOCF that only fills gaps smaller than a certain size.

julia> M = [1.0 2.0 missing missing missing 6.0; 1.1 missing missing 4.4 5.5 6.6]
2×6 Array{Union{Missing, Float64},2}:
 1.0  2.0       missing   missing   missing  6.0
 1.1   missing  missing  4.4       5.5       6.6

julia> impute(M,  LimitedLOCF(2); dims=:rows)
2×6 Array{Union{Missing, Float64},2}:
 1.0  2.0   missing   missing   missing  6.0
 1.1  1.1  1.1       4.4       5.5       6.6

The code this comes from has a number of broken tests that display slightly different behaviour. It is unknown whether this is desired behaviour that wasn't implemented or just differences between versions. The behaviour in those tests is to fill in up to max_gap_size values and leave any remaining missings.

This is what the alternative behaviour would look like.

julia> impute(M,  LimitedLOCF(2); dims=:rows)
2×6 Array{Union{Missing, Float64},2}:
 1.0  2.0  2.0  2.0  missing  6.0
 1.1  1.1  1.1  4.4  5.5      6.6
rofinn commented 3 years ago

I think this should be implemented with a limit keyword to the existing LOCF, NOCB and Interpolate methods.

https://github.com/invenia/Impute.jl/issues/110

oxinabox commented 3 years ago

The desired behavour of the original implementation (which was written some long time ago in Matlab) is to not impute at all if the gap is longer than (e.g.) 1 hour. The alternative behavour is to impute only up to the limit entries So impute the first hour of a n-hour gap.

I agree that the alternative makes sense. But tests would have to be run to see that it doesn't cause regressions in final result for our systems that use this.