inpowell / modulartabler

MIT License
3 stars 0 forks source link

Resolve error on determine_cell_suppression() when no totals are within nullspace #9

Closed PeterM74 closed 2 months ago

PeterM74 commented 2 months ago

If a user creates a MappingTable that doesn't contain a total category and the user (accidentally or programatically) calls determine_cell_suppression on the aggregate table with the (empty) nullspace matrix, the code errors with an uninformative error from the solver:

am_map <- BaseMappingTable$new(tibble(am = 0:1, Transmission = c("Auto", "Man")),
                               "am", "Transmission")
gear_map <- BaseMappingTable$new(tibble(gear = 3:5, Gears = c("3", "4", "5")),
                                 "gear", "Gears")
testMM <- MultiMappingTable$new(am_map, gear_map)

count_aggregate(testMM, mtcars) %>% mutate(supp = determine_cell_suppression(n, testMM$nullspace))
Error in `mutate()`:
ℹ In argument: `supp = determine_cell_suppression(n, testMM$nullspace)`.
Caused by error in `L_constraint()`:
! number of columns of 'L' and length 'names' must be equal.
Run `rlang::last_trace()` to see where the error occurred.

Obviously there is no real need to call determine_cell_suppression on an aggregate table without totals as there is no need to find secondary suppression targets and the user could just calculate the primary suppression cells themselves. However I believe the error message is not informative to a user who mistakenly calls it in these instances (and embarrassingly that was me today 🤣). I propose either:

  1. When secondary_suppression is called with an empty nullspace matrix, it first checks if the matrix is empty and if so, returns the suppress object immediately (without informing the user)
  2. Does above but outputs a message to the user to inform them that secondary suppression can't be applied to a Mapping with no totals.
  3. Throws an informative error to the user

I am partial to option 1 as I don't think informing the user solves anything (and may generate more questions) and there's no measurable performance hit for calling a function that exits immediately. Having said that, I think any of these three options would be suitable (or happy to hear alternatives). I'll submit a PR with option 1 as an example and then we can substitute any of the alternative options.