Closed eribul closed 3 years ago
I could in fact strip ne regex_
prefix rom the classcodes objects. One reason they are there is that the attributes are in fact based on those names during the construvtion of the objects. It might be reasonable to remove them after those attributes are set, but before exporting the final object.
Review: Good point! I have made several changes:
classcodes
object no longer have column prefixes (reg|ind)ex_
. print.classcodes()
method for a better default display of classcodes where regex and indices are identified by a heading and not by column names prefixcategorize()
has a new argument check.names
(same as data.frame
/data.table
). This argument is TRUE
by default, making the column names syntactically correct (using dots instead of spaces). The original names (possibly with spaces) are recieved by check.names = FALSE
, which might sometimes be useful.The reason for the long names implied by tech_names
is that categorize
is sometimes used multiple times, for example to enhance a data set with both comorbidity and adverse events. To group such variable names by common and desriptive prefixes might then be useful.
This is also related to #130
The output of categorize() on a table returns columns with spaces in their names. This isn't well set up for additional analysis, since it makes it difficult to do any kind of programming with them, including using data.table to filter for one diagnosis or to aggregate the percentage of patients (perhaps within each group) that have a condition. It's nice for displaying the names in a table, but is it a common use case to display individual patients in a table (as opposed to aggregated statistics?)
It seems like the tech_names argument is designed to fix this, but it leaves prefixes like charlsonregex on every column name, which will need to be removed for meaningful downstream analysis. How about removing the charlsonregex, or at least the regex, in these cases? (Indeed, is there a reason that the charlson classcodes object itself has to have the regex prefixes? It already has an attribute regexprs that includes those column names). Besides which, perhaps consider leaving tech_names to default to TRUE for the reasons described above.