Closed eribul closed 4 years ago
Review: Thank you for noticing! The message should now be the same as for categorize()
. I agree that such a feature is relevant. The problem, however, is that unit data is matched to code data based on the index variable and that I cannot perform such matching based on the date column (which would be a non-equi-join, as allowed for some data.table
operations but not in merge
which is currently used). Although this would be possible after some refactoring of internal functions, I think it is currently better to perform such operations using standard functionality outside the package, such as with x %>% group_by(y) %>% codify(...)
for dplyr
or x[, codify(...), by = y]
with data.table
.
Tänk till ändå om vi kan lösa det!
Fast nej ...
If there are duplicate names in the data passed to codify(), it returns a data.table error that isn't informative toward fixing the problem. (categorize() does catch this with "Non-unique ids!" but not codify()).
More importantly, don't there exist use cases for categorize where there are multiple events for the same patient, with different dates? Examples could include adverse events after starting multiple lines of therapy, or comorbidities before multiple diagnoses. In those cases, doesn't it make sense to return one row for each event, even if there are multiple for a patient? Should the check only error out when there are duplicate name/date pairs?