tidyverse / reprex

Render bits of R code for sharing, e.g., on GitHub or StackOverflow.
https://reprex.tidyverse.org
Other
741 stars 80 forks source link

Improve html preview #470

Closed jennybc closed 2 months ago

jennybc commented 2 months ago

Fixes #468

Several changes to reprex's html preview:

  1. Gain an awareness of light vs. dark mode, as reported by rstudioapi::getThemeInfo(), with light mode as the fall back.
  2. Vendor the GitHub CSS into reprex, using https://github.com/sindresorhus/github-markdown-css which is more up-to-date than the rmarkdown package cd1a508d57498c389f2fb3eb1a608fdf9329f311.
  3. Fixup the GitHub CSS to so that it works for our application. Most importantly for #468, we now set both the color and background-color for <body>, plus a few other tweaks 11eca0dd4c2d39317ec559d5ed809e6eac151c78.
  4. Use a custom syntax highlighting theme that emulates what GitHub does, but shoves it into the framework that Pandoc uses (the KDE Syntax-Highlighting framework) c5747caebc01cf72d5a75f7e8ca80ddfce62fe09. Today, GitHub highlights R using a proprietary tool called PrettyLights. There's an open source project https://github.com/wooorm/starry-night that aims to emulate that, so that's where I got the colors from. More on this below.
  5. Use rmarkdown::pandoc_convert() (for md-to-html conversion) in a more canonical way, which allows us to eliminate the use of a template.
  6. Use a custom syntax definition for R that tweaks some peculiar choices in the definitive source = the KDE Syntax-Highlighting project 5679f011712eda932fba8da43c7c87c712db4356. More on that below.

468 could be fixed just by making sure to specify both the color and background-color for <body>. But while I was working on this, I noticed various weird things about how R code is highlighted both on GitHub and by Pandoc! So I dug into that, which explains my forays into a custom theme and syntax definition. Hopefully this will lead to some improvements outside of reprex for syntax highlighting of R.

GitHub side quest

I said that PrettyLights is the current highlighter for R. According to https://github.com/wooorm/starry-night:

GitHub has slowly started to move towards TreeLights, which is based on TreeSitter, and also closed source. If TreeLights includes a language (currently: C, C#, CSS, CodeQL, EJS, Elixir, ERB, Gleam, Go, HTML, Java, JS, Nix, PHP, Python, RegEx, Ruby, Rust, TLA, TS), [then TreeLights is used instead of PrettyLights]

The textmate grammar that Github + PrettyLights uses for R has some shortcomings, such as not properly tokenizing / classing function calls. This is related to a separate problem @DavisVaughan has noticed, which is that click-on-a-reference doesn't pull up the symbols pane (whereas click-on-a-definition does). Davis poked at this a bit and now he's got a ticket into the right system to see if his recent tree-sitter + R successes at GitHub could possibly lead to R becoming one of the languages handled with TreeLights. I think this would lead to better syntax highlighting of R, among other benefits.

KDE Syntax-Highlighting side quest

Pandoc will automatically highlight syntax in fenced code blocks that are marked with a language name. The Haskell library skylighting is used for highlighting.

Source: https://pandoc.org/chunkedhtml-demo/13-syntax-highlighting.html

If you are not satisfied with the built-in highlighting, ..., you can use the --syntax-definition option to load a KDE-style XML syntax definition file. Before writing your own, have a look at KDE’s repository of syntax definitions.

Indeed I was not satisfied with the built-in highlighting, so I did have a look at KDE's repository of syntax definitions. I think the R syntax definition used by Pandoc has some strange, suboptimal choices. Here it is in various places, in increasing order of importance:

I engaged @cderv in some discussions around this XML and he's in agreement that we could improve it. And he has made a successful merge request before.

I opened an issue to discuss with maintainers: https://invent.kde.org/frameworks/syntax-highlighting/-/issues/32

Here are a few easy tweaks in this PR: 5679f011712eda932fba8da43c7c87c712db4356

jennybc commented 2 months ago

Positron dark theme, which is the most massive improvement in this PR:

We're trying to match (or do better than!) what GitHub does -- notice that mean() is not highlighted as a function call, but it should be.

github-dark

With this PR, Positron still uses a light theme, but still a big improvement. Notice that we syntax highlight mean() as a function call.

pr-positron-dark

Before this PR is very bad.

cran-positron-dark
jennybc commented 2 months ago

RStudio dark theme, which is also indicative of where Positron will be once it provides info re: dark vs light theme.

We're trying to match (or do better than!) what GitHub does -- notice that mean() is not highlighted as a function call, but it should be.

github-dark

With this PR, RStudio newly honors a dark theme. Notice that we syntax highlight mean() as a function call.

pr-rstudio-dark

Before this PR, RStudio looked fine, but used a light theme unconditionally.

cran-rstudio-dark
jennybc commented 2 months ago

Positron light theme

We're trying to match (or do better than!) what GitHub does -- notice that mean() is not highlighted as a function call, but it should be.

github-light

With this PR. Notice that we syntax highlight mean() as a function call.

pr-positron-light

Before this PR. You can still see there is a problem with the background color (i.e. the code chunk does not stand out against the background), but it's less of an eyesore than the dark mode case.

cran-positron-light
jennybc commented 2 months ago

RStudio light theme

We're trying to match (or do better than!) what GitHub does -- notice that mean() is not highlighted as a function call, but it should be.

github-light

With this PR. Notice that we syntax highlight mean() as a function call.

pr-rstudio-light

Before this PR.

cran-rstudio-light
jennybc commented 2 months ago

@hadley @cderv I'm happy to get feedback, but I mostly tagged you here for awareness. A minimal fix for the target issue here in reprex is relatively small. The main value of the PR might be identifying the changes we should try to make elsewhere: the KDE syntax highlighting definition file for R (affects all R highlighted by Pandoc) and how GitHub approaches highlighting for R.

cderv commented 2 months ago

how GitHub approaches highlighting for R

Thanks for sharing all this ! I think this will be useful to improve Github Preview format in R Markdown and Quarto.

DavisVaughan commented 2 months ago

FWIW re GitHub updating to (presumably) use TreeLights for R:

I just heard back from an engineer on the the team and they said that the work to fix the syntax highlighting is currently not prioritized. Someone from that team will reach out when they upgrade the highlighting, if they need help verifying the changes. The team has an internal issue open about your request.

So probably not in the cards anytime soon, but at least its on their radar now

jennybc commented 2 months ago

Thanks for the update @DavisVaughan. Given that forecast, maybe we should consider upgrading the textmate grammar for R? That might dovetail with making some upgrades to the KDE syntax definition (the Pandoc side of the house).

The vendored grammar appears to live here: https://github.com/github-linguist/linguist/tree/f0aebbe90d3b9b9bae5b7258f99370def1ee489f/vendor

with the true grammar source being regarded as this: https://github.com/textmate/r.tmbundle

I see both @hadley and Jim Hester have contributed there at some point in the past, so it's possible it's already as good as it gets (?). But I get better highlighting of R in VS Code than on GitHub and I think that's also using a textmate grammar, so that gives me hope.

Updated: I see that the true grammar source for some languages smells associated with VS Code or VS Code extensions. So that might be another low effort improvement, i.e. convince GitHub to consult a differen source for the R grammar.

jennybc commented 2 months ago

WOW, that was a long time ago

Screenshot 2024-09-10 at 8 32 30 AM