Open MeganFantes opened 5 years ago
Updated idea:
Do not implement a third case, instead change the first case:
1) bins entered: use Laplace mechanism, check impute
parameter, always add NA bucket if impute = False
2) bins not entered: use stability mechanism
Need to update histogram vignette to make sure impute
is used in all contexts
Ira and I discussed this at length, and we decided this issue should be tabled for now.
Given the way the library is structured now, where there are export()
statements in the statistics to call the mechanisms, there is no logical way to set a local attribute in a subclass and the check for its existence.
We plan to do major restructuring of the library to have the mechanisms and statistics be completely separate entities, and in this case it will be more possible to set impute
as an attribute of only the histogram statistic.
When the library is restructured, we can revisit the issue of conditioning the call to fillMissing()
on impute
for the histogram statistic.
Right now there are 2 cases when making a histogram for a categorical variable: 1) The user enters a list of bins, and the laplace mechanism is used 2) The user does NOT enter a list of bins, and the stability mechanism is used
We want to implement a third case: 3) the user enters a list of bins, but the list is a subset of the full list of levels the variable takes. So we add an NA bins to the list of bins, set all levels that were not entered in the list of bins to NA, and then use the stability mechanism
In implementing this third case, we will use the existing
histogramCategoricalBins
function inutilities-histogram.R