Open rofinn opened 3 years ago
I think proposed changes 3-6 shouldn't be breaking, so I'll bump the remainder of this issue to the 1.0 release.
Following https://github.com/invenia/AxisSets.jl/pull/44#discussion_r614292489:
Ideally we would standardise how to handle dim
s and AxisSets.Pattern
s across Impute.jl, FeatureTransforms.jl, and AxisSets.jl, in which the former two are both supported.
I think the main difference in handling is that FeatureTransforms.jl supports dims=:
, which means apply a transform element-wise over an array.
We should also use a consistent convention for what e.g. dims=1
means (outdated but relevant issue: https://github.com/invenia/FeatureTransforms.jl/issues/18)
Overview
The current implementation has some nice features for handling iterative data and provides early exit conditions. Unfortunately, these features are harder to maintain as we need to handle more use cases and different data structures. A couple of examples of this include:
AbstractContext
type isn't entirely intuitive.:rows
/:cols
for processing a columntable or a rowtable? I suppose we could expect folks to explicitly pass in acolumntable
orrowtable
, but that seems a little unfriendly from a usability standpoint.Proposed Changes
DropAbstractContext
and maybe replace it with some or all of the below:-Impute.replace!
: which will handle theallowmissing
call and could support replacing values in multiple columns at once.-Impute.assert
?: if you want to throw an error if some missing data threshold is reached [trivial in most cases]-Impute.mask
?: will just give you a binary mask over your input data [trivial in most cases]Add anImpute.filter
option which will filter observations base on some threshold. Along adims
would probably be more general. This is also probably more general thandropobs
anddropvars
?Out of scope
Success Conditions
replace
,filter
)Failure Conditions
Trade-offs
Related Issues & PRs
50
51
60