ropensci / vcr

Record and replay HTTP requests
https://docs.ropensci.org/vcr
Other
77 stars 12 forks source link

check_cassette_names fails when function argument name included #168

Open alex-gable opened 4 years ago

alex-gable commented 4 years ago

I've created a repo to reproduce the below scenario.

Session Info ```r r$> devtools::session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.0.0 (2020-04-24) os macOS Catalina 10.15.4 system x86_64, darwin17.0 ui vscode language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Los_Angeles date 2020-05-26 ─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) backports 1.1.7 2020-05-13 [1] CRAN (R 4.0.0) base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.0) callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0) R catfact * 0.1.1 [?] cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) crayon 1.3.4.9000 2020-05-15 [1] Github (r-lib/crayon@dcf6d44) crul 0.9.0 2019-11-06 [1] CRAN (R 4.0.0) curl 4.3 2020-05-16 [1] Github (jeroen/curl@3e38a0a) desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0) digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) fauxpas 0.5.0 2020-04-13 [1] CRAN (R 4.0.0) fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0) glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.0) httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.0.0) httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.0) jsonlite 1.6.1 2020-02-02 [1] CRAN (R 4.0.0) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.0) magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) memoise 1.1.0.9000 2020-05-16 [1] Github (hadley/memoise@4aefd9f) pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0) pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0) praise 1.0.0 2015-08-11 [1] CRAN (R 4.0.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0) ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0) R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0) Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0) remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0) rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0) rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) testthat * 2.3.2 2020-03-02 [1] CRAN (R 4.0.0) triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.0.0) urltools 1.7.3 2019-04-14 [1] CRAN (R 4.0.0) usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0) vcr * 0.5.4 2020-03-31 [1] CRAN (R 4.0.0) webmockr 0.6.2 2020-03-24 [1] CRAN (R 4.0.0) whisker 0.4 2019-08-28 [1] CRAN (R 4.0.0) withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0) yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) [1] /Users/alexgable/.R/library [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library R ── Package was removed from disk. r$> devtools::test() Loading catfact Does this run check cassettes for duplicates?: FALSE Testing catfact ✔ | OK F W S | Context ✔ | 11 | cat_fact [0.3 s] ✔ | 11 | cat_fact [0.3 s] ══ Results ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ Duration: 0.6 s OK: 22 Failed: 0 Warnings: 0 Skipped: 0 r$> devtools::test() Loading catfact Does this run check cassettes for duplicates?: TRUE Error: you should not have duplicated cassette names: name (found in test-cat_fact_fail.R) ```

check_cassette_names fails when the name = {cassette name} form is used to call the function. debugging shows that the issue is in the cassette_names function.

    cassette_names <- function(x) {
        browser()
        tmp <- parse(x, keep.source = TRUE)
        df <- utils::getParseData(tmp)
        row.names(df) = NULL
        z <- as.numeric(row.names(df[df$text == "use_cassette", 
            ])) + 2
        gsub("\"", "", df[z, "text"])
    }

Debugging the internals led me to discover that the "magic number" "2" being added to the "use_cassette" row number is dependent on allowing strictly a "(" between the function call and the name of the cassette. Looking at df as it gets subsetted and assigned to z, you'd need to add "4" in the named-argument instance. This situation becomes even more dire if the function arguments are out of order in addition to being named.

Inspecting the df created by calling getParseData on my erroring file, I found that the SYMBOL_FUNCTION_CALL for use_cassettes gives us a scope of line1 == 5 or line1 == 18 to find our "name" argument. I believe you'll find promising results by selecting the token == "STR_CONST in those subsets.

    line1 col1 line2 col2  id parent                token terminal                                      text
...
22      5    3     8    4  80    126                 expr    FALSE                                          
23      5    3     5   14  32     34 SYMBOL_FUNCTION_CALL     TRUE                              use_cassette
24      5    3     5   14  34     80                 expr    FALSE                                          
25      5   15     5   15  33     80                  '('     TRUE                                         (
26      5   16     5   19  35     80           SYMBOL_SUB     TRUE                                      name
27      5   20     5   20  36     80               EQ_SUB     TRUE                                         =
28      5   21     5   39  37     39            STR_CONST     TRUE                       "scrape_events_htm"
29      5   21     5   39  39     80                 expr    FALSE                                          
30      5   40     5   40  38     80                  ','     TRUE                                         ,
31      5   42     8    3  75     80                 expr    FALSE                                          
32      5   42     5   42  43     75                  '{'     TRUE                                         {
33      6    5     7   70  70     75                 expr    FALSE                                          
34      6    5     6   18  45     47               SYMBOL     TRUE                            events_htm_out
35      6    5     6   18  47     70                 expr    FALSE                                          
36      6   20     6   21  46     70          LEFT_ASSIGN     TRUE                                        <-
37      6   23     7   70  68     70                 expr    FALSE                                          
38      6   23     6   42  48     50 SYMBOL_FUNCTION_CALL     TRUE                      sc_scrape_events_htm
39      6   23     6   42  50     68                 expr    FALSE                                          
40      6   43     6   43  49     68                  '('     TRUE                                         (
41      6   44     6   50  51     68           SYMBOL_SUB     TRUE                                   game_id
42      6   52     6   52  52     68               EQ_SUB     TRUE                                         =
43      6   54     6   65  53     55               SYMBOL     TRUE                              test_game_id
44      6   54     6   65  55     68                 expr    FALSE                                          
45      6   66     6   66  54     68                  ','     TRUE                                         ,
46      7   44     7   52  60     68           SYMBOL_SUB     TRUE                                 season_id
47      7   54     7   54  61     68               EQ_SUB     TRUE                                         =
48      7   56     7   69  62     64               SYMBOL     TRUE                            test_season_id
49      7   56     7   69  64     68                 expr    FALSE                                          
50      7   70     7   70  63     68                  ')'     TRUE                                         )
51      8    3     8    3  73     75                  '}'     TRUE                                         }
52      8    4     8    4  76     80                  ')'     TRUE                                         )
...
91     18    3    20    4 198    244                 expr    FALSE                                          
92     18    3    18   14 159    161 SYMBOL_FUNCTION_CALL     TRUE                              use_cassette
93     18    3    18   14 161    198                 expr    FALSE                                          
94     18   15    18   15 160    198                  '('     TRUE                                         (
95     18   16    18   19 162    198           SYMBOL_SUB     TRUE                                      name
96     18   20    18   20 163    198               EQ_SUB     TRUE                                         =
97     18   21    18   39 164    166            STR_CONST     TRUE                       "scrape_events_api"
98     18   21    18   39 166    198                 expr    FALSE                                          
99     18   40    18   40 165    198                  ','     TRUE                                         ,
100    18   42    20    3 193    198                 expr    FALSE                                          
101    18   42    18   42 170    193                  '{'     TRUE                                         {
...

Looking at the lobstr::ast() for various ways to tell vcr which cassette to use, it's hard to see how all cases can be accommodated and parsed easily.

Example Abstract Syntax Trees ```r r$> lobstr::ast(expression(testthat::test_that("cat_fact works", { vcr::use_cassette(name = "cat_fact", { aa <- cat_fact() expect_is(aa, "list") expect_named(aa, c('fact', 'length')) expect_is(aa$fact, 'character') expect_type(aa$length, 'integer') }) })) ) █─expression └─█─█─`::` │ ├─testthat │ └─test_that ├─"cat_fact works" └─█─`{` └─█─█─`::` │ ├─vcr │ └─use_cassette ├─name = "cat_fact" └─█─`{` ├─█─`<-` │ ├─aa │ └─█─cat_fact ├─█─expect_is │ ├─aa │ └─"list" ├─█─expect_named │ ├─aa │ └─█─c │ ├─"fact" │ └─"length" ├─█─expect_is │ ├─█─`$` │ │ ├─aa │ │ └─fact │ └─"character" └─█─expect_type ├─█─`$` │ ├─aa │ └─length └─"integer" r$> lobstr::ast(expression(testthat::test_that("cat_fact works", { vcr::use_cassette("cat_fact", { aa <- cat_fact() expect_is(aa, "list") expect_named(aa, c('fact', 'length')) expect_is(aa$fact, 'character') expect_type(aa$length, 'integer') }) })) ) █─expression └─█─█─`::` │ ├─testthat │ └─test_that ├─"cat_fact works" └─█─`{` └─█─█─`::` │ ├─vcr │ └─use_cassette ├─"cat_fact" └─█─`{` ├─█─`<-` │ ├─aa │ └─█─cat_fact ├─█─expect_is │ ├─aa │ └─"list" ├─█─expect_named │ ├─aa │ └─█─c │ ├─"fact" │ └─"length" ├─█─expect_is │ ├─█─`$` │ │ ├─aa │ │ └─fact │ └─"character" └─█─expect_type ├─█─`$` │ ├─aa │ └─length └─"integer" r$> lobstr::ast(expression(testthat::test_that("cat_fact works", { use_cassette("cat_fact", { aa <- cat_fact() expect_is(aa, "list") expect_named(aa, c('fact', 'length')) expect_is(aa$fact, 'character') expect_type(aa$length, 'integer') }) })) ) █─expression └─█─█─`::` │ ├─testthat │ └─test_that ├─"cat_fact works" └─█─`{` └─█─use_cassette ├─"cat_fact" └─█─`{` ├─█─`<-` │ ├─aa │ └─█─cat_fact ├─█─expect_is │ ├─aa │ └─"list" ├─█─expect_named │ ├─aa │ └─█─c │ ├─"fact" │ └─"length" ├─█─expect_is │ ├─█─`$` │ │ ├─aa │ │ └─fact │ └─"character" └─█─expect_type ├─█─`$` │ ├─aa │ └─length └─"integer" r$> lobstr::ast(expression(testthat::test_that("cat_fact works", { use_cassette(name = "cat_fact", { aa <- cat_fact() expect_is(aa, "list") expect_named(aa, c('fact', 'length')) expect_is(aa$fact, 'character') expect_type(aa$length, 'integer') }) })) ) █─expression └─█─█─`::` │ ├─testthat │ └─test_that ├─"cat_fact works" └─█─`{` └─█─use_cassette ├─name = "cat_fact" └─█─`{` ├─█─`<-` │ ├─aa │ └─█─cat_fact ├─█─expect_is │ ├─aa │ └─"list" ├─█─expect_named │ ├─aa │ └─█─c │ ├─"fact" │ └─"length" ├─█─expect_is │ ├─█─`$` │ │ ├─aa │ │ └─fact │ └─"character" └─█─expect_type ├─█─`$` │ ├─aa │ └─length └─"integer" ```
sckott commented 4 years ago

thanks for the issue @alex-gable and for the thorough dive into the problem.

I admit the solution in check_cassette_names() isn't great. Partly we're limited because R doesn't have a great way to parse/manipulate/inspect ASTs.

I'll have a look and see if there's a solution that will be more general

sckott commented 4 years ago

it gets more complex too:

name <- "foo_bar"
use_cassette(name, {
    ... 
})
sckott commented 4 years ago

I started working on this, but its complicated. we need to account for:

tried breaking up each test file by test_that blocks first, but that's still complicated, and not sure if that would lead to problems as well

my local branch cassette-names

alex-gable commented 4 years ago

it occurred to me that this functionality might be implemented in a package like r-lib/styler

apologize for not giving a more thorough inspection of how exactly that package could help. but something like a syntax highlighter or language server that returns a more rich syntax tokenization of R code.

sckott commented 4 years ago

Good idea to look at styler. I should look to see if there's anything in there that may help

sckott commented 3 years ago

Sorry for delay on this, getting a new version out soon, then can look at this again