tidyverse / rvest

Simple web scraping for R
https://rvest.tidyverse.org
Other
1.49k stars 341 forks source link

Conflict with the knitr cache #369

Closed prosoitos closed 10 months ago

prosoitos commented 1 year ago

It is well-known that xml2 objects use external pointers that are not serialized by saveRDS() (see for instance this Stack Overflow answer, or the issues #344 and #264).

This makes it impossible to use rvest with the knitr cache: when R tries to reuse the cache files, it cannot find the external pointers and thus outputs the error message:

external pointer is not valid

Quarto uses the knitr cache and I ran into the problem in that context (see quarto-dev/quarto-cli#4249).

I guess it wouldn't be easy to create a fix for this without rethinking the way the knitr cache works, but maybe rvest could output an informative message telling the user to disable the Quarto/knitr cache when it is running in that context.

This wouldn't be a very satisfactory solution but it would already help.

Thanks!

hadley commented 10 months ago

Unfortunately I don't see anything obvious that we can do about this.