ropensci / vcr

Record and replay HTTP requests
https://docs.ropensci.org/vcr
Other
77 stars 12 forks source link

use_cassette hangs indefiniteley #271

Closed zachary-foster closed 2 months ago

zachary-foster commented 2 months ago

This example code hangs indefinitely the first time it is run and then errors the second time when the presumably incomplete cassette is used after forcing R to quit. Any ideas why this would happen?

library(vcr)
library(taxize)

# Works as expected
children(161994, "itis")

# Hangs indefinitely
use_cassette("deleteme", {
  children(161994, "itis")
})

# Clean up
file.remove("deleteme.yml")
Session Info ```r ─ Session info ────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.4.1 (2024-06-14) os Pop!_OS 22.04 LTS system x86_64, linux-gnu ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Los_Angeles date 2024-09-05 rstudio 2024.04.2+764 Chocolate Cosmos (desktop) pandoc NA ─ Packages ────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source ape 5.8 2024-04-11 [1] CRAN (R 4.4.1) base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.4.1) bold 1.3.0 2024-09-04 [1] Github (ropensci/bold@404fd11) brio 1.1.5 2024-04-24 [1] CRAN (R 4.4.1) cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.1) cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.1) codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2) crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.1) crul 1.5.0 2024-07-19 [1] CRAN (R 4.4.1) curl 5.2.2 2024-08-26 [1] CRAN (R 4.4.1) data.table 1.16.0 2024-08-27 [1] CRAN (R 4.4.1) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.1) digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1) dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.1) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.1) fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.1) fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1) fauxpas 0.5.2 2023-05-03 [1] CRAN (R 4.4.1) foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.1) fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.1) generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.1) glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.1) htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1) htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.1) httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.4.1) httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.1) httr 1.4.7 2023-08-15 [1] CRAN (R 4.4.1) httr2 1.0.3 2024-08-22 [1] CRAN (R 4.4.1) iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.1) jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.1) later 1.3.2 2023-12-06 [1] CRAN (R 4.4.1) lattice 0.22-5 2023-10-24 [4] CRAN (R 4.3.1) lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.1) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.1) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.1) mime 0.12 2021-09-28 [1] CRAN (R 4.4.1) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.1) nlme 3.1-165 2024-06-06 [4] CRAN (R 4.4.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.1) pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.1) pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.1) plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.1) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.1) promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.1) purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.1) R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.1) rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.1) Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.4.1) remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.1) ritis 1.0.0 2021-02-02 [1] CRAN (R 4.4.1) rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.1) rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.1) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.1) shiny 1.9.1 2024-08-01 [1] CRAN (R 4.4.1) solrium 1.2.0 2021-05-19 [1] CRAN (R 4.4.1) stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.1) stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.1) taxize * 0.9.102 2024-09-05 [1] local testthat 3.2.1.1 2024-04-14 [1] CRAN (R 4.4.1) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.1) tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.1) triebeard 0.4.1 2023-03-04 [1] CRAN (R 4.4.1) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.1) urltools 1.7.3 2019-04-14 [1] CRAN (R 4.4.1) usethis 3.0.0 2024-07-29 [1] CRAN (R 4.4.1) utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.1) vcr * 1.6.0 2024-07-23 [1] CRAN (R 4.4.1) vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.1) webmockr 1.0.0 2024-07-23 [1] CRAN (R 4.4.1) whisker 0.4.1 2022-12-05 [1] CRAN (R 4.4.1) xml2 1.3.6 2023-12-04 [1] CRAN (R 4.4.1) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.1) yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1) zoo 1.8-12 2023-04-13 [1] CRAN (R 4.4.1) [1] /home/fosterz/R/x86_64-pc-linux-gnu-library/4.4 [2] /usr/local/lib/R/site-library [3] /usr/lib/R/site-library [4] /usr/lib/R/library ───────────────────────────────────────────────────────────────────────────────────────────────────── ```
sckott commented 2 months ago

Thanks Zach. No ideas yet. I'll explore and get back to you. Is this happening with other fxns or just this one?

sckott commented 2 months ago

Pretty sure it's invalid text, i.e., I can record with vcr with serialize_with = "json", but then it fails on the 2nd run of use_cassette

jsonlite::fromJSON("/var/folders/qt/fzq1m_bj2yb_7b2jz57s9q7c0000gp/T//RtmpnNpMFY/deleteme.json")
#> Error in parse_con(txt, bigint_as_char) :
#>   lexical error: invalid bytes in UTF8 string.
#>           ":\"623502\"},{\"author\":\"G\xfcnther, 1866\",\"class\":\"gov.
#>                      (right here) ------^
sckott commented 2 months ago

I'm guessing R hangs when we're using yaml because that pkg often just blows up when an error occurs rather than giving back a useful error message. Whereas jsonlite (used when we serialize the cassette in json) does give back useful information as above.

sckott commented 2 months ago

I guess we could check for valid ute-8 text or so and error if it isn't or convert it to uff-8 or something similar, not sure yet

zachary-foster commented 2 months ago

Hi Scott! Its only happening with this function so far. Thanks for looking into it!

sckott commented 2 months ago

Thanks, I'll experiment and see what works

sckott commented 2 months ago

@zachary-foster So I could change something here in vcr to handle this kind of error, but there's no easy fix, and because I haven't seen a lot of this problem, would you be open to adjusting the params a bit like

use_cassette("deleteme", {
  children(161994, "itis")
}, preserve_exact_body_bytes = TRUE)

That should store the response body as base64 encoded, and then deserializes back from base64 when read, so avoids encoding issues

zachary-foster commented 2 months ago

That sounds fine to me. Thanks for the work around!

sckott commented 2 months ago

Great, thanks for the issue