Open benthestatistician opened 4 years ago
The use case I had in mind is the one described in comments to #190:
- User fits
psmA
using datadat
; thendistA = match_on(psmA, data=dat)
. At this pointdat
has informative row names.- User then applies
dplyr::mutate()
to add columns todat
, overwritingdat
with the result of this. (From the user's perspective, it's the same data set, just with some new columns added on.)- User now fits
psmB
, again using datadat
;distB = match_on(psmB, data=dat)
.- When user does
fullmatch(psmA + psmB)
, problem is declared to be infeasible.What's happened here is that at step 2, mutate silently dropped the informative row names from step 1, replacing them w/
as.character(1L:nrow(dat))
; then at step 4, the ISMpsmA + psmB
forbids all matches, due to incompatible dimnames.
dat
of step 1 is a data frame, whereas the dat
of step 3 is a tibble: that user now gets an informative error. The user can fix this error by casting dat
as a tibble before ever giving it to match_on()
, so that both ISMs carry the same (non-informative) row and column names.But that's still annoying. This user would have a much easier time if we simply let them proceed, with a warning. Given the rules for ISM arithmetic that #190 puts into place, that's what would happen if at stage 3 we produced an ISM without row or column names, as I propose in this issue.
Tibbles typically have non-informative row-names. Tidyverse processes that generate tibbles may silently replace informative rownames with non-informative ones. Since we place meaning in row names, problems may ensue.
190 addresses the worst of the tibble-trouble, but we can certainly move further toward accommodating tidyverse users. Perhaps it would be helpful if
match_on()
returned an ISM without row names when given a tibble in itsdat=
slot. Since they're no informative anyway. OTOH, maybe nameless ISMs/DenseMatrix-es would generate too many warnings and trouble with other functions to be useful.Discuss.