r-lib / downlit

Syntax Highlighting and Automatic Linking
https://downlit.r-lib.org
Other
90 stars 22 forks source link

Provide option for RStudio default 'textmate' theme compatible highlighting #57

Closed jjallaire closed 4 years ago

jjallaire commented 4 years ago

Working towards creating an option for RStudio textmate theme compatible highlighting. Initially creating the theme in the distill package but hopefully we can move it in to downlit if it's considered more generally useful.

I started with the built in haddock theme (which is pretty similar to textmate) and made the following changes:

https://github.com/rstudio/distill/compare/02b241083b8ca5cda90954c6c37e9f11bf830b2c...13fb0f6b34e9d04df0bd24a02980e29105a8f68d#diff-f088084fe658ee281215b486b2f18dab

Note that I left in all of the tokens not produced by downlit in case the pandoc highlighter produces them for other languages. Note that haddock defined no special handling for Float so I added that (along with DecVal using the same color as float). Not sure if there are other tokens that should be treated the same way?

Here are the remaining differences between RStudio's token handling and downlit's:

1) Parentheses are given their own token type. While pandoc doesn't have a parentheses token type, in the default RStudio theme we highlight parentheses as operators so having a mode in downlit where parens are treated as operators would allow us to emulate this.

2) NULL is treated as a constant. Here are all of the tokens we currently highlight as constants:

var builtinConstants = lang.arrayToMap([
   "NULL", "NA", "TRUE", "FALSE", "T", "F", "Inf",
   "NaN", "NA_integer_", "NA_real_", "NA_character_",
   "NA_complex_"
]);

3) We define a set of "special functions" which we treat as language keywords. You may object to this, but I think that the line between "language keyword" and "function" in R is pretty slippery, and many language extensions (e.g. various class definition syntaxes) are functions. Suffice to say that the line drawn by the R tokenizer isn't a clean conceptual one :-) Anyway, here's our list of both keywords and special functions:

var keywords = lang.arrayToMap([
   "function", "if", "else", "in", "break", "next", "repeat", "for", "while"
]);

var specialFunctions = lang.arrayToMap([
   "return", "switch", "try", "tryCatch", "stop",
   "warning", "require", "library", "attach", "detach",
   "source", "setMethod", "setGeneric", "setGroupGeneric",
   "setClass", "setRefClass", "R6Class", "UseMethod", "NextMethod"
]);

Some of these decisions you might find questionable (this was Joe and I ~ 8 yrs ago trying to sort this out w/o much context as users). That said, I'm not aware of anyone ever complaining about our highlighting not making sense to them.

If you don't want to adopt any of these conventions in downlit, perhaps there could be a parameter added to highlight() to do RStudio compatible tokenizing?