yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.36k stars 873 forks source link

[spin] Better support for `roxygen2` style comments #2317

Open kylebutts opened 5 months ago

kylebutts commented 5 months ago

Currently, knitr::spin and roxygen2 style documentation do not work together. Any #' becomes markdown, e.g.:

text = c(
  "# %% ",
  "#' Say hello to a user",
  "#' ",
  "#' @param name The person who you are greeting",
  "#' @export",
  "say_hello <- function(name = '') {",
  "  print(paste0('Hello ', name))",
  "}",
  "say_hello()"
)

knitr::spin(
  text = text,
  format = 'qmd', knit = F
) |> xfun::raw_string()
#> 
#> ```{r}
#> ```
#> 
#> Say hello to a user
#> 
#> @param name The person who you are greeting
#> @export
#> 
#> ```{r}
#> say_hello <- function(name = '') {
#>   print(paste0('Hello ', name))
#> }
#> say_hello()
#> ```

I was wondering if using roxygen's tools could help separate instances where roxygen comments are used from markdown comments. It's clear roxygen2 can identify roxygen comments. It does so by finding expressions in the final and looking at comments above those expressions:

file = tempfile(fileext = ".R")
xfun::write_utf8(text, file)
roxygen2::parse_file(file)
#> [1] "Hello "
#> [[1]]
#> <roxy_block> [file47d62ac2036a.R:6]
#>   $tag
#>     [line:  2] @title 'Say hello to a user' {parsed}
#>     [line:  4] @param 'name The person who you are greeting' {parsed}
#>     [line:  5] @export '' {parsed}
#>     [line:  6] @usage '<generated>' {parsed}
#>     [line:  6] @.formals '<generated>' {parsed}
#>     [line:  6] @backref '<generated>' {parsed}
#>   $call   say_hello <- function(name = "") { ...
#>   $object <function> 
#>     $topic say_hello
#>     $alias say_hello

Using roxygen2 directly is potentially problematic since it imports knitr. However, extracting the necessary code uses just base R:

path = file
lines <- xfun::read_utf8(path)
calls <- parse(
  text = lines,
  keep.source = TRUE,
  srcfile = srcfilecopy(path, lines, isFile = TRUE)
)

# For each expression, find the preceeding comments (including with skip lines)
comments <- function(refs) {
  if (length(refs) == 0) {
    return(list())
  }

  srcfile <- attr(refs[[1]], "srcfile")

  # first_line, first_byte, last_line, last_byte
  com <- vector("list", length(refs))
  for (i in seq_along(refs)) {
    # Comments begin after last line of last block, and this block is included
    # so that it can be parsed for additional comments
    if (i == 1) {
      first_byte <- 1
      first_line <- 1
    } else {
      first_byte <- refs[[i - 1]][4] + 1
      first_line <- refs[[i - 1]][3]
    }

    last_line <- refs[[i]][3]
    last_byte <- refs[[i]][4]

    lloc <- c(first_line, first_byte, last_line, last_byte)
    com[[i]] <- srcref(srcfile, lloc)
  }

  com
}

srcrefs <- utils::getSrcref(calls)
comment_refs <- comments(srcrefs)
comment_refs
#> [[1]]
#> # %% 
#> #' Say hello to a user
#> #' 
#> #' @param name The person who you are greeting
#> #' @export
#> say_hello <- function(name = '') {
#>   print(paste0('Hello ', name))
#> }
#> 
#> [[2]]
#> 
#> say_hello()

A simple (but imperfect) heuristic could be that if an #' @ is present, then assume it's a roxygen comment and don't convert to markdown. If users want just a title, they would need to use the @title command. If they want to have an @ in the beginning of a line of markdown, they could use @@ like in roxygen?

Is there interest for this as a PR? Adds complexity, but IMO in almost all cases this will prevent issues with roxygen2/spin clashes.