Open jackwasey opened 5 years ago
R doesn't let us encode a string as UTF-8 if it is just ASCII. While all ICD codes are ASCII we might assume that these strings are therefore unique in the global char cache, and thus we can use the memory pointer for all ICD codes.
Strings are slow. R has an internal 'factor' mechanism, so each unique string only has one memory address. We can exploit this to speed up string processing (assuming same encoding for all strings!)