r-lib / downlit

Syntax Highlighting and Automatic Linking
https://downlit.r-lib.org
Other
90 stars 22 forks source link

Support R native pipe #126

Closed cderv closed 2 years ago

cderv commented 2 years ago

While working on https://github.com/rstudio/rmarkdown/issues/2196 and following https://github.com/rstudio/rmarkdown/issues/1881#issuecomment-1015736535 I think downlit needs to be adjusted to support the parsing of |> and => pipe operators.

I compared the result with magrittr pipe

downlit::highlight("1:3 |> mean()", downlit::classes_pandoc())
#> [1] "<span class='fl'>1</span><span class='op'>:</span><span class='fl'>3</span> |&gt; <span class='fu'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='op'>(</span><span class='op'>)</span>"
library(magrittr)
downlit::highlight("1:3 %>% mean()", downlit::classes_pandoc())
#> [1] "<span class='fl'>1</span><span class='op'>:</span><span class='fl'>3</span> <span class='op'>%&gt;%</span> <span class='fu'><a href='https://rdrr.io/r/base/mean.html'>mean</a></span><span class='op'>(</span><span class='op'>)</span>"
Some notes as I was looking into this Probably just need to support new toke type `PIPE` and `PIPEBIND` to add to infix category in downlit ``` r Sys.setenv("_R_USE_PIPEBIND_" = TRUE) downlit:::parse_data("mtcars |> data => lm(mpg ~ cyl, data = data)") #> $text #> [1] "mtcars |> data => lm(mpg ~ cyl, data = data)" #> #> $expr #> expression(mtcars |> data => lm(mpg ~ cyl, data = data)) #> #> $data #> line1 col1 line2 col2 id parent token terminal text #> 31 1 1 1 44 31 0 expr FALSE #> 1 1 1 1 6 1 3 SYMBOL TRUE mtcars #> 3 1 1 1 6 3 31 expr FALSE #> 2 1 8 1 9 2 31 PIPE TRUE |> #> 30 1 11 1 44 30 31 expr FALSE #> 4 1 11 1 14 4 6 SYMBOL TRUE data #> 6 1 11 1 14 6 30 expr FALSE #> 5 1 16 1 17 5 30 PIPEBIND TRUE => #> 28 1 19 1 44 28 30 expr FALSE #> 7 1 19 1 20 7 9 SYMBOL_FUNCTION_CALL TRUE lm #> 9 1 19 1 20 9 28 expr FALSE #> 8 1 21 1 21 8 28 '(' TRUE ( #> 16 1 22 1 30 16 28 expr FALSE #> 10 1 22 1 24 10 12 SYMBOL TRUE mpg #> 12 1 22 1 24 12 16 expr FALSE #> 11 1 26 1 26 11 16 '~' TRUE ~ #> 13 1 28 1 30 13 15 SYMBOL TRUE cyl #> 15 1 28 1 30 15 16 expr FALSE #> 14 1 31 1 31 14 28 ',' TRUE , #> 20 1 33 1 36 20 28 SYMBOL_SUB TRUE data #> 21 1 38 1 38 21 28 EQ_SUB TRUE = #> 22 1 40 1 43 22 24 SYMBOL TRUE data #> 24 1 40 1 43 24 28 expr FALSE #> 23 1 44 1 44 23 28 ')' TRUE ) ``` The anonymous function syntax `\(args) expr` seems not specifically parsed ``` r downlit:::parse_data(text = "\\(x) x+1") #> $text #> [1] "\\(x) x+1" #> #> $expr #> expression(\(x) x+1) #> #> $data #> line1 col1 line2 col2 id parent token terminal text #> 14 1 1 1 8 14 0 expr FALSE #> 1 1 1 1 1 1 14 '\\\\' TRUE \\ #> 2 1 2 1 2 2 14 '(' TRUE ( #> 3 1 3 1 3 3 14 SYMBOL_FORMALS TRUE x #> 4 1 4 1 4 4 14 ')' TRUE ) #> 12 1 6 1 8 12 14 expr FALSE #> 6 1 6 1 6 6 8 SYMBOL TRUE x #> 8 1 6 1 6 8 12 expr FALSE #> 7 1 7 1 7 7 12 '+' TRUE + #> 9 1 8 1 8 9 10 NUM_CONST TRUE 1 #> 10 1 8 1 8 10 12 expr FALSE ```

I can try a PR if you want.

This would offer better support for R highlighting in rmarkdown now that downlit is supported. After all , the KDE syntax file for R is not the best for parsing the R syntax.

For reference, we currently have a patch in rmarkdown to support those operator in HTML document when Pandoc highlighting is used. We will also provide a r.xml for Pandoc but support is limited in term of Pandoc version.

hadley commented 2 years ago

A PR would be great — I think it should be straight forward. You just need to add PIPE to the appropriate place in token_type().