ropensci / vcr

Record and replay HTTP requests
https://docs.ropensci.org/vcr
Other
77 stars 12 forks source link

When writing cassettes, text/csv response objects are parsed #234

Closed steffilazerte closed 3 years ago

steffilazerte commented 3 years ago

I'm trying to use vcr to run tests offline for weathercan. My package often deals with oddly formatted data which is first downloaded, then fixed, THEN parsed.

For some reason, when creating cassettes, vcr is parsing these text/csv response objects, creating warnings and messages as a result and then triggering a bunch of test failures (which expect silent responses).

If I re-run the tests (using cassettes but not creating them), the errors go away, but it's a pain to do this (to remember!) every time I update the fixtures.

Perhaps hide parsing messages when creating cassettes?

It took me a while to get the reprex exactly figured out, library(vcr) is key (for some reason!). I cannot replicate these results if I use vcr::use_casette()

Problematic parsing messages:

library(vcr)
use_cassette("steffistest", httr::GET("https://climate.weather.gc.ca/climate_normals/bulk_data_e.html?format=csv&lang=e&prov=mb&yr=1981&stnID=3471&climateID=5010480&submit=Download%20Data"))

#> CrulAdapter enabled!
#> HttrAdapter enabled!
#> net connect allowed
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `Climate Normals 1981-2010 Station Data` = col_character()
#> )
#> Warning: 111 parsing failures.
#> row col  expected     actual         file
#>   2  -- 1 columns 8 columns  <raw vector>
#>   3  -- 1 columns 8 columns  <raw vector>
#>  11  -- 1 columns 15 columns <raw vector>
#>  13  -- 1 columns 15 columns <raw vector>
#>  14  -- 1 columns 15 columns <raw vector>
#> ... ... ......... .......... ............
#> See problems(...) for more details.
#> ejecting cassette: steffistest
#> CrulAdapter disabled!
#> HttrAdapter disabled!
#> <vcr - Cassette> steffistest
#>   Record method: once
#>   Serialize with: yaml
#>   Persist with: FileSystem
#>   Re-record interval (s): 
#>   Clean outdated interactions?: FALSE
#>   update_content_length_header: FALSE
#>   allow_playback_repeats: FALSE
#>   allow_unused_http_interactions: 
#>   exclusive: 
#>   preserve_exact_body_bytes: FALSE

Expected results:

vcr::use_cassette("steffistest", httr::GET("https://climate.weather.gc.ca/climate_normals/bulk_data_e.html?format=csv&lang=e&prov=mb&yr=1981&stnID=3471&climateID=5010480&submit=Download%20Data")); devtools::session_info()
#> CrulAdapter enabled!
#> HttrAdapter enabled!
#> net connect allowed
#> ejecting cassette: steffistest
#> CrulAdapter disabled!
#> HttrAdapter disabled!
#> <vcr - Cassette> steffistest
#>   Record method: once
#>   Serialize with: yaml
#>   Persist with: FileSystem
#>   Re-record interval (s): 
#>   Clean outdated interactions?: FALSE
#>   update_content_length_header: FALSE
#>   allow_playback_repeats: FALSE
#>   allow_unused_http_interactions: 
#>   exclusive: 
#>   preserve_exact_body_bytes: FALSE
Session Info ```r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.0.3 (2020-10-10) #> os Ubuntu 20.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_CA:en #> collate en_CA.UTF-8 #> ctype en_CA.UTF-8 #> tz America/Winnipeg #> date 2021-04-14 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.2) #> cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.3) #> callr 3.6.0 2021-03-28 [1] CRAN (R 4.0.3) #> cli 2.4.0 2021-04-05 [1] CRAN (R 4.0.3) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3) #> crul 1.1.0 2021-02-15 [1] CRAN (R 4.0.3) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.2) #> debugme 1.1.0 2017-10-22 [1] CRAN (R 4.0.3) #> desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.3) #> devtools 2.4.0 2021-04-07 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3) #> fauxpas 0.5.0 2020-04-13 [1] CRAN (R 4.0.2) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.1.9000 2021-03-22 [1] Github (rstudio/htmltools@10d6287) #> httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.0.2) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.3) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.3) #> pillar 1.5.1 2021-03-05 [1] CRAN (R 4.0.3) #> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) #> pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.3) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2) #> processx 3.5.1 2021-04-04 [1] CRAN (R 4.0.3) #> ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.3) #> readr 1.4.0 2020-10-05 [1] CRAN (R 4.0.2) #> remotes 2.3.0 2021-04-01 [1] CRAN (R 4.0.3) #> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.3) #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) #> rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) #> styler 1.3.2 2020-02-23 [1] CRAN (R 4.0.2) #> testthat 3.0.2 2021-02-14 [1] CRAN (R 4.0.3) #> tibble 3.1.0 2021-02-25 [1] CRAN (R 4.0.3) #> triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.0.2) #> urltools 1.7.3 2019-04-14 [1] CRAN (R 4.0.2) #> usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.3) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.3) #> vcr * 0.6.0 2020-12-12 [1] CRAN (R 4.0.3) #> vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.3) #> webmockr 0.8.0 2021-03-14 [1] CRAN (R 4.0.3) #> whisker 0.4 2019-08-28 [1] CRAN (R 4.0.2) #> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.3) #> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.3) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) #> #> [1] /home/steffi/R/x86_64-pc-linux-gnu-library/4.0 ``` #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library Created on 2021-04-14 by the reprex package (v1.0.0) ```
sckott commented 3 years ago

thanks @steffilazerte ! will have a look in about a week and a half

sckott commented 3 years ago

I cannot replicate these results if I use vcr::use_casette()

yep, running library(vcr) is required first

sckott commented 3 years ago

that warning comes from the readr package. To record the response body, vcr needs to parse the http response. Done here https://github.com/ropensci/vcr/blob/master/R/request_handler-httr.R#L61 for httr . I could suppress any warnings in the content() call, but I don't think that's a good idea. One could suppress warnings as needed themselves in tests I think

Alternatively, if you write the data to disk in your GET request, vcr does not parse the response, but only records the file path in the cassette. Would that work better for you?

steffilazerte commented 3 years ago

Do you mean that I would change my code to save the GET request to disk, or that I could change vcr options to save it to disk? I don't want to change my code to that effect because it's not necessary and would only be an intermediate file.

I can suppress warnings in the tests of course, but then I suppress all warnings, not just potential parsing warnings which is a bit less ideal.

Does vcr really have to parse the results as csv? Could they not be parsed as text or raw? I definitely don't have a good understanding of the internal workings, so that may not make any sense!

If there's isn't anything to do, that's totally fine, I'm sure I can come up with a workaround, thanks!

sckott commented 3 years ago

Do you mean that I would change my code to save the GET request to disk, or that I could change vcr options to save it to disk?

I didn't mean to suggest you change your code, just explaining how it works.

Does vcr really have to parse the results as csv?

In this line we use content(), which under the hood parses different data types differently. my package crul does not do that, alternatively, you have to parse the data yourself from text - so its a different approach. I could potentially say httr::content(x, as = "text") on that line, but I'm not sure if that would work, would take some looking into

steffilazerte commented 3 years ago

I see, looks like an automatic thing with httr! Only thing I can think of would be to pass an argument for specifying the "as" in httr::content through vcr, but that's probably more trouble than it's worth! This is something easier addressed on my end, thanks for looking into it! Feel free to close the issue :grin:

sckott commented 3 years ago

Thanks for opening the issue. Agree, I think in this case it's more appropriate for the user to handle