[Question] Custom Styleguide Help

rossdrucker commented 1 year ago

Hi,

I'm working on trying to build out a custom style guide that's based upon the tidyverse style guide. I read through the vignette, but don't know that I understand what to do well enough to follow and adapt it for my own use case.

As I said, the style guide I'm trying to define in {styler} is really, at its core, implementing a few changes to the tidyverse style guide. The changes I'd like to implement are:

Arguments across multiple lines separated by commas should have the comma at the start of the following line, not the end of the preceding line
Code sections should have 5 blank lines between them, and the whole line should be 80 characters total
Two blank lines should be at the end of the file

So, for example, this code:


# Start section --------------------

ex_df = data.frame(
  x = 1:10,
  y = letters[1:10],
  z = LETTERS[1:10]
)

# New section -------------------

ex_df |> writeRDS("~/Documents/ex_df.RDS")

should be reformatted as this code:


# Start section ----------------------------------------------------------------

ex_df = data.frame(
  x = 1:10
  , y = letters[1:10]
  , z = LETTERS[1:10]
)

# New section ------------------------------------------------------------------

ex_df |> writeRDS("~/Documents/ex_df.RDS")

I follow in the vignette how to generate the AST and generally what the columns are intended to show, but I'm a bit lost with the rest of the implementation and how to define my own transforms for these few cases. Any help would be greatly appreciated!

lorenzwalthert commented 1 year ago

Hi. Thanks. If you feel lost, oleander Pont Ottke exact parts in the docs where you can’t follow. Did you look at the implementation of custom style guides that are mentioned in https://styler.r-lib.org/articles/distribute_custom_style_guides.html?

lorenzwalthert commented 1 year ago

and the whole line should be 80 characters total

this is a long standing feature request and not easy. #247.

rossdrucker commented 1 year ago

Thanks for the quick reply, I did look through those examples. I think there's a few places I'm confused, and admittedly it's more likely on me than on your documentation.

Specifically around the formatting around the comma rules, I think the logic I want to implement is to go one token up from where the comma is found and set newline to 0 and lag_newlines to 1. My confusion comes with how to do that within the AST. I'm using this as the code example I'm looking to style:

code <- "data.frame(
  x = 1:10
  , y = letters[1:10],
  z = 11:20,
  col1 = LETTERS[1:10]
)
"

and I follow what's here to generate the AST via

> pd <- styler:::compute_parse_data_nested(code) %>% 
  styler:::pre_visit_one(default_style_guide_attributes)

> pd$child[[1]] %>% as_tibble()
# A tibble: 18 × 18
   pos_id token      terminal text          short token_before       token…¹ style…² block is_ca…³ inter…⁴ child  newli…⁵ lag_n…⁶ spaces multi…⁷ inden…⁸ indent
    <int> <chr>      <lgl>    <chr>         <chr> <chr>              <chr>   <lgl>   <lgl> <lgl>   <lgl>   <list>   <int>   <int>  <int>   <int> <lgl>    <int>
 1      3 expr       FALSE    data.frame    data. NA                  NA     FALSE   NA    FALSE   FALSE   <df>         0       0      0      NA NA           0
 2      4 '('        TRUE     (             (     SYMBOL_FUNCTION_C… "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       1       0      2       0 NA           0
 3      5 SYMBOL_SUB TRUE     x             x     '('                "EQ_SU… FALSE   NA    FALSE   FALSE   <NULL>       0       1      1       0 NA           0
 4      6 EQ_SUB     TRUE     =             =     SYMBOL_SUB         "NUM_C… FALSE   NA    FALSE   FALSE   <NULL>       0       0      1       0 NA           0
 5      7 expr       FALSE    1:10          1:10  NA                  NA     FALSE   NA    FALSE   FALSE   <df>         1       0      2      NA NA           0
 6     13 ','        TRUE     ,             ,     NUM_CONST          "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       0       1      1       0 NA           0
 7     14 SYMBOL_SUB TRUE     y             y     ','                "EQ_SU… FALSE   NA    FALSE   FALSE   <NULL>       0       0      1       0 NA           0
 8     15 EQ_SUB     TRUE     =             =     SYMBOL_SUB         "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       0       0      1       0 NA           0
 9     16 expr       FALSE    letters[1:10] lette NA                  NA     FALSE   NA    FALSE   FALSE   <df>         0       0      0      NA NA           0
10     27 ','        TRUE     ,             ,     ']'                "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       1       0      2       0 NA           0
11     28 SYMBOL_SUB TRUE     z             z     ','                "EQ_SU… FALSE   NA    FALSE   FALSE   <NULL>       0       1      1       0 NA           0
12     29 EQ_SUB     TRUE     =             =     SYMBOL_SUB         "NUM_C… FALSE   NA    FALSE   FALSE   <NULL>       0       0      1       0 NA           0
13     30 expr       FALSE    11:20         11:20 NA                  NA     FALSE   NA    FALSE   FALSE   <df>         0       0      0      NA NA           0
14     36 ','        TRUE     ,             ,     NUM_CONST          "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       1       0      2       0 NA           0
15     37 SYMBOL_SUB TRUE     col1          col1  ','                "EQ_SU… FALSE   NA    FALSE   FALSE   <NULL>       0       1      1       0 NA           0
16     38 EQ_SUB     TRUE     =             =     SYMBOL_SUB         "SYMBO… FALSE   NA    FALSE   FALSE   <NULL>       0       0      1       0 NA           0
17     39 expr       FALSE    LETTERS[1:10] LETTE NA                  NA     FALSE   NA    FALSE   FALSE   <df>         1       0      0      NA NA           0
18     50 ')'        TRUE     )             )     ']'                ""      FALSE   NA    FALSE   FALSE   <NULL>       0       1      0       0 NA           0
# … with abbreviated variable names ¹token_after, ²stylerignore, ³is_cached, ⁴internal, ⁵newlines, ⁶lag_newlines, ⁷multi_line, ⁸indention_ref_pos_id

How do I set up the transformers to implement the logic on the AST? My pseudocode logic is:

1) Find anywhere there's a comma 2) If token_before != "NUM_CONST", set token_before = "NUM_CONST 3) If newline == 1, set newline == 0 4) If lag_newline == 0, set lag_newline == 1

This is all based on pos_id == 13 in the table above, which does follow the format I'm looking for. But I'm not sure how to do this in the transformer if there are multiple children I need to address

lorenzwalthert commented 1 year ago

Thanks for the additional context. I think you are on the right track.

But I'm not sure how to do this in the transformer if there are multiple children I need to address

You won't ever have more than one function call in one nest. Since the function you write is applied with the visitor (either pre or post), this will be handled for you. All you need to do is to apply the logic you outlined in your transformer. I'd just the browser() inside that transformer function so you can develop as you debug.

Here's what I came up with for the part where you want to always set the line breaks before a comma to 1:

library(styler)
cache_deactivate()

code <- "data.frame(
x = 1:10
, y = letters[1:10],
z = 11:20,
col1 = LETTERS[1:10]
)"

always_break_line_before_column <- function(pd) {
  # only jump into nests when there is a comma
  if (any(pd$text == ',')) browser()
  # only modify lag_newlines, not newlines
  # lag newlines are the lines before that token
  is_comma <- pd$token == "','"
  pd$lag_newlines[is_comma] <- 1L
  pd
}

sg <- create_style_guide(
  line_break = list(always_break_line_before_column)
)
style_text(code, transformers = sg)

Now you can write another rule that sets the line breaks after a comma to 0 (ensure the next token after the comma is not a comment, otherwise your approach will create invalid code). You can also combine two rules in one transformer, this will be more efficient computationally.

You can change the line breaks in your input code and see if it does what you want. Note that we don't define formatting from scratch, i.e. if you don't have any rule, the output is the input. The rules only change the current formatting, so if you want to preserve formatting, you don't need to write a rule for that.

rossdrucker commented 1 year ago

Awesome, thank you! I've updated what you have to be as follows:

library(styler)
cache_deactivate()

code <- "data.frame(
x = 1:10
, y = letters[1:10],
z = 11:20,
col1 = LETTERS[1:10]
)"

always_break_line_before_comma <- function(pd) {
  # find commas in the tokens
  is_comma <- pd$token == "','"

  # make it so a new line precedes it
  pd$lag_newlines[is_comma] <- 1L

  # add a space after the comma
  pd$spaces[is_comma] <- 1L

  # enforce that commas be at the start of new lines
  is_follows_comma <- pd$token_before == "','"
  pd$lag_newlines[is_follows_comma] <- 0L

  # add a space after a comma when adjusting the following line
  pd$spaces[is_follows_comma] <- 1L

  # return the reformatted code
  pd
}

sg <- create_style_guide(
  line_break = list(always_break_line_before_column)
)
style_text(code, transformers = sg)

This gets me to 95% of the functionality that I think I'm after here (and I think I'm actually understanding it better too!). What I'm not sure of how to do now is to properly add the indentation. I've tried playing around with increasing pd$indent where is_comma or is_follows_comma is TRUE, but that doesn't seem to be the right approach. I've also tried passing tidyverse_reindention() to style_text() but that isn't working correctly either, and I don't think I want to change the token to be indented. I also know that this doesn't account for comments, but I haven't worked on that piece yet.

Mind helping me figure out what I'm missing indentation-wise?

lorenzwalthert commented 1 year ago

Great. Yeah basically you are building a style guide from scratch here that has no other rules than the ones you explicitly define. But to get what you want, you could just take the tidyverse style guide and amend the rule about like breaks around commas. That would then also solve your indention issue. And you won’t have to bother about spaces around comma etc because there are rules that already do that in tidyverse_style.

lorenzwalthert commented 1 year ago

From https://styler.r-lib.org/articles/remove_rules.html on how to remove rules:

sg <- function(...) {
  transformers <- tidyverse_style(...)
  transformers$line_breaks$set_line_break_around_comma_and_or <- NULL # removes the rule on line break around commas

  transformers
}

style_text("string = 'hi there'", style = sg)

note that this rule also affects |, ||, & and &&. Maybe you also want these to behave like commas? If not, re-define the rule without commas, i.e. take the definition from styler source code and drop "','" from the vector ops:

set_line_break_around_comma_and_or <- function(pd, strict) {
  ops <- c("','", "AND", "OR", "AND2", "OR2")
  comma_with_line_break_that_can_be_removed_before <-
    (pd$token %in% ops) &
      (pd$lag_newlines > 0L) &
      (pd$token_before != "COMMENT") &
      !(lag(pd$token) %in% subset_token_opening)

  pd$lag_newlines[comma_with_line_break_that_can_be_removed_before] <- 0L
  pd$lag_newlines[lag(comma_with_line_break_that_can_be_removed_before)] <- 1L

  comma_with_line_break_that_can_be_moved_two_tokens_left <- which(
    (pd$token == "EQ_SUB") &
      (pd$lag_newlines > 0L) &
      (pd$token_before != "COMMENT") &
      !(lag(pd$token) %in% subset_token_opening)
  )

  pd$lag_newlines[comma_with_line_break_that_can_be_moved_two_tokens_left] <- 0L
  token_before <- map_int(
    comma_with_line_break_that_can_be_moved_two_tokens_left,
    previous_non_comment,
    pd = pd
  )
  pd$lag_newlines[token_before] <- 1L
  pd
}

sg <- function(...) {
  transformers <- tidyverse_style(...)
  transformers$line_breaks$set_line_break_around_comma_and_or <- set_line_break_around_comma_and_or # don't include commas
  transformers$line_breaks$always_break_line_before_comma <- always_break_line_before_comma

  transformers
}

The only function used in this body from the styler internals is lag(), so either use dplyr::lag or styler:::lag() in that place. Also prefix map_int() to purrr::map_int to not require {purrr} to be loaded.

For the don't remove line break if there is a comma before, just see the rules in R/rules-line-breaks.R, they all have this check that the relevant token before is not a COMMA.

rossdrucker commented 1 year ago

@lorenzwalthert this has been incredibly helpful, I can't thank you enough! I do still have one more outstanding question: how am I able to generate the AST for an entire file? I follow how to use style_text() to update the style of a specified text, but I'm interested in implementing a few rules to apply to places in the file (e.g. adding 2 blank lines at the end). Is it just reading in the file as a string, then following the same steps to generate the AST for that string?

lorenzwalthert commented 1 year ago

@lorenzwalthert this has been incredibly helpful, I can't thank you enough!

You can always do a one-time or recurring sponsoring on my GitHub Sposor. 😄

lorenzwalthert commented 1 year ago

Adding two blank lines at the end of the file is a bit more complicated. I assume you want it only for the Addin and the file-based APIs like style_pkg()?

lorenzwalthert commented 1 year ago

I can't undestand why you want any of these rules 😄

But if you want, you can change transform_utf8_one() -> write_utf8(new, path) to write_utf8(new, c(path, "")) to append an extra blank line. For that, you need to create and install and distribute your own fork of {styler} though, this can't be specified over transformers.

rossdrucker commented 1 year ago

Excellent, I think I've got it. Thanks again!

r-lib / styler

[Question] Custom Styleguide Help #1142