ropensci / coder

Classification of Cases into Deterministic Categories
https://docs.ropensci.org/coder/
22 stars 4 forks source link

tibbles and data.tables #121

Closed eribul closed 3 years ago

eribul commented 4 years ago

Your examples like ex_people are tibbles, but when categorize() or codify() is passed a tibble, it returns a data.table. This would be a surprising behavior for people using these packages within a tidyverse workflow. I think data.table is a terrific package, but there's not a reason to surprise users with the data type if they're not accustomed to it. (And the fact that the example datasets are tibbles rather than data.frames or data.tables adds to the inconsistency a bit).

I recommend ending the functions with something like

# Where data was the argument passed in, and ret is what's about to be returned
if (tibble::is_tibble(data)) {
  ret <- tibble::as_tibble(ret)
}

This would mean that it returns a data.table when it's passed a data.frame or data.table, but a tibble if and only if it's passed a tibble. Admittedly, this requires adding an import for tibble (which perhaps is why it wasn't done), but since tibble is imported by 800 CRAN packages (including dplyr + ggplot2, each depended on by ~2000 packages) it's a fairly low-impact dependency. This also doesn't strike me as a utility package that will frequently be installed in production systems; it's a scientific package that would typically used with other data analysis tools. I think there are some useful thoughts on tibble dependencies here.

eribul commented 4 years ago

Review:

I have made the following changes:

eribul commented 4 years ago