highlight() trims long strings

moodymudskipper commented 1 year ago

This causes issues in {styler} : https://github.com/r-lib/styler/issues/216

x <- paste0('"', strrep("-", 1000), '"')
prettycode::highlight(x)
#> [1] "[1000 chars quoted with '\"']"

data <- getParseData(parsed, includeText = NA) is the culprit, we'd need includeText = TRUE and then a bit of wrangling.

If we want to keep the default behaviour could we add some argument to opt out ?

{prettycode} is sometimes used in use cases where faithfully representing code is important, so I hope it makes sense to fix it here.

For context such long strings can be found in the output of sessionInfo()

gaborcsardi commented 1 year ago

tibble::as_tibble(getParseData(parse(text = paste0("fun('", strrep("-", 10), "')")), includeText=TRUE))
#> # A tibble: 7 × 9
#>   line1  col1 line2  col2    id parent token                terminal text       
#>   <int> <int> <int> <int> <int>  <int> <chr>                <lgl>    <chr>      
#> 1     1     1     1    17    10      0 expr                 FALSE    fun('-----…
#> 2     1     1     1     3     1      3 SYMBOL_FUNCTION_CALL TRUE     fun        
#> 3     1     1     1     3     3     10 expr                 FALSE    fun        
#> 4     1     4     1     4     2     10 '('                  TRUE     (          
#> 5     1     5     1    16     4      6 STR_CONST            TRUE     '---------…
#> 6     1     5     1    16     6     10 expr                 FALSE    '---------…
#> 7     1    17     1    17     5     10 ')'                  TRUE     )
tibble::as_tibble(getParseData(parse(text = paste0("fun('", strrep("-", 10), "')")), includeText=TRUE))$text[5:6]
#> [1] "'----------'" "'----------'"
tibble::as_tibble(getParseData(parse(text = paste0("fun('", strrep("-", 1000), "')")), includeText=TRUE))
#> # A tibble: 7 × 9
#>   line1  col1 line2  col2    id parent token                terminal text       
#>   <int> <int> <int> <int> <int>  <int> <chr>                <lgl>    <chr>      
#> 1     1     1     1  1007    10      0 expr                 FALSE    fun('-----…
#> 2     1     1     1     3     1      3 SYMBOL_FUNCTION_CALL TRUE     fun        
#> 3     1     1     1     3     3     10 expr                 FALSE    fun        
#> 4     1     4     1     4     2     10 '('                  TRUE     (          
#> 5     1     5     1  1006     4      6 STR_CONST            TRUE     [1000 char…
#> 6     1     5     1  1006     6     10 expr                 FALSE    '---------…
#> 7     1  1007     1  1007     5     10 ')'                  TRUE     )

^{Created on 2022-11-06 with reprex v2.0.2}

gaborcsardi commented 1 year ago

I wonder if it is the same with raw strings. Probably.

moodymudskipper commented 1 year ago

Yes it does, and with long symbols too, though I guess it's pretty rare to have a 1000 char long symbol.

This seems to do it :

# same as `getParseData(, includeText = NA)` but making sure strings and symbols are not trimmed
get_parse_data <- function(x) {
  # include text so we don't lose long strings and symbols
  data <- getParseData(x, includeText = TRUE)
  # fetch indices of potentially trimmed text
  ind <- which(data$token %in% c("STR_CONST", "SYMBOL")) 
  # replace with untrimmed
  data$text[ind] <- data$text[ind + 1]
  # remove text for non terminal tokens, as `getParseData(, includeText = NA)` would
  data$text[!data$terminal] <- ""
  data
}

Would you like a PR ?

gaborcsardi commented 1 year ago

Would you like a PR ?

Yes, please. But please use the id and parent columns for the mapping, it is not guaranteed that the parent is in the next row.

r-lib / prettycode

highlight() trims long strings #20