ropensci / vcr

Record HTTP calls and replay them
https://docs.ropensci.org/vcr
Other
77 stars 12 forks source link

Slow loading of test fixtures with large bodies #66

Open amoeba opened 6 years ago

amoeba commented 6 years ago

This issue is related to #51. Over there, @sckott patched issues with vcr and dependent packages relating to large response bodies. After those fixes, I noticed that my fixtures appear to take 3-5 seconds to load which seems slow. I turned off my network connection and ran the tests to verify the tests weren't just making a network request connection and the tests run and still take just as long without a network connection.

This error can be reproduced by checking out https://github.com/amoeba/dataonecrul/commit/05f52cf0e8be84af9e82f264115c1a2c9797f15a and running devtools::test().

Here's what I see in my Test pane:

==> devtools::test()

Loading dataonecrul
Loading required package: testthat
Loading required namespace: jsonlite
Testing dataonecrul
✔ | OK F W S | Context
✔ |  2       | getObject [3.3 s]
✔ |  1       | SystemMetadata

══ Results ════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 3.4 s

OK:       3
Failed:   0
Warnings: 0
Skipped:  0

The fixture for getObject is ~ 10MiB in size. I simple yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_system_metadata.yml") is nearly instantaneous so maybe it's not directly a YAML parsing issue.

Session Info ```r > devtools::session_info() Session info ---------------------------------------------------------------------------------------------- setting value version R version 3.5.0 (2018-04-23) system x86_64, darwin15.6.0 ui RStudio (1.1.447) language (EN) collate en_US.UTF-8 tz America/Juneau date 2018-04-30 Packages -------------------------------------------------------------------------------------------------- package * version date source base * 3.5.0 2018-04-24 local base64enc 0.1-3 2015-07-28 CRAN (R 3.5.0) compiler 3.5.0 2018-04-24 local curl 3.2 2018-03-28 CRAN (R 3.5.0) datasets * 3.5.0 2018-04-24 local devtools 1.13.5 2018-02-18 CRAN (R 3.5.0) digest 0.6.15 2018-01-28 CRAN (R 3.5.0) git2r 0.21.0 2018-01-04 CRAN (R 3.5.0) graphics * 3.5.0 2018-04-24 local grDevices * 3.5.0 2018-04-24 local httr 1.3.1 2017-08-20 CRAN (R 3.5.0) jsonlite 1.5 2017-06-01 CRAN (R 3.5.0) knitr 1.20 2018-02-20 CRAN (R 3.5.0) lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0) magrittr 1.5 2014-11-22 CRAN (R 3.5.0) memoise 1.1.0 2017-04-21 CRAN (R 3.5.0) methods * 3.5.0 2018-04-24 local R6 2.2.2 2017-06-17 CRAN (R 3.5.0) Rcpp 0.12.16 2018-03-13 CRAN (R 3.5.0) rstudioapi 0.7 2017-09-07 CRAN (R 3.5.0) stats * 3.5.0 2018-04-24 local tools 3.5.0 2018-04-24 local triebeard 0.3.0 2016-08-04 cran (@0.3.0) urltools 1.7.0 2018-01-20 cran (@1.7.0) utils * 3.5.0 2018-04-24 local vcr * 0.0.8.9534 2018-04-30 Github (ropensci/vcr@ba32bc9) webmockr * 0.2.2.9119 2018-04-30 Github (ropensci/webmockr@34fae0c) withr 2.1.2 2018-03-15 CRAN (R 3.5.0) yaml 2.1.18 2018-03-08 CRAN (R 3.5.0) ```
sckott commented 6 years ago

thanks, been having a look. yes, yaml loading is pretty fast, but after loading there's something(s) related to size of that response loaded in - i may not get to this for the first version https://github.com/ropensci/vcr/milestone/1 - since its more a performance issue (as far as we can tell so far 😬)

amoeba commented 6 years ago

No worries on when this gets in since I'm just kicking the tires. Looking at a profr trace of devtools::test() I see that the majority of time is taken within calls to what profr labels as self$deserialized_hash which is essentially a call to self$serializer$deserialize_path() which is essentially a call to yaml::yaml.load_file(self$path) (If I'm not mistaken, at least). And yaml::load_file is fast:

> microbenchmark({
+   some_var <- yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_object.yml")
+ }, times = 10)
Unit: milliseconds
                                                                                    expr      min
 {     some_var <- yaml::yaml.load_file("tests/fixtures/vcr_cassettes/get_object.yml") } 392.7561
       lq    mean   median       uq      max neval
 395.4149 410.628 398.3413 421.9784 458.8146    10

I assume R6 is faster than this so maybe something else is up?

I feel like vcr's functions should be relatively fast. Looking lower down on the trace, I wonder if testthat's use of eval() and related functions is to blame.

sckott commented 3 years ago

Note: Test this with json serializer to see if its an issue with yaml pkg itself