rstudio / pins-r

Pin, Discover and Share Resources
https://pins.rstudio.com
Other
301 stars 62 forks source link

Finally rip out these Connect user and content caches #793

Closed juliasilge closed 9 months ago

juliasilge commented 9 months ago

In #667 we moved from caches saved to disk to environment caches, but this still causes problems that are very confusing and hard to reason about, as @ryjohnson09 demonstrates in #789. I would like to propose we just throw this whole idea away; my opinion is that these Connect user and content caches are not gaining us much (especially now that they are only environment caches) and when things go wrong, it's pretty horrible.

hadley commented 9 months ago

Just to reassure me that this doesn't have a big performance impact, could you give a couple of benchmarks of reading/writing a pin before and after this change?

juliasilge commented 9 months ago

It's definitely slower. Here is with current CRAN:

bench::mark(
    pins = {
        library(pins)
        board <- board_connect()
        name <- pin_write(board, 1, pins:::random_pin_name())
        testthat::expect_equal(pin_read(board, name), 1)
        pin_delete(board, name)
    },
    iterations = 5
)
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-NfgrIuJ1Yf'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-aqX9EJ6vgk'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-Tobj1QG7JD'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-UsFthxGSFR'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-2DtQsuQjFY'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-OaKWLFT8vB'
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pins          3.64s     3.8s     0.264    14.8MB    0.685

Created on 2023-10-04 with reprex v2.0.2

And here is with this PR:

bench::mark(
    pins = {
        library(pins)
        board <- board_connect()
        name <- pin_write(board, 1, pins:::random_pin_name())
        testthat::expect_equal(pin_read(board, name), 1)
        pin_delete(board, name)
    },
    iterations = 5
)
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-jJODh5a5wa'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-oe8m7ajoYw'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-sGcG0aUn9t'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-9pB1xPTZpm'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-RngIENxoHL'
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/test-veDVBVQXWv'
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 1 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pins          4.01s    4.05s     0.245      15MB    0.686

Created on 2023-10-04 with reprex v2.0.2

In this example, we look up what content on Connect belongs to name three times (write, read, delete). With CRAN right now, we look that up one time and then after that look in the content cache, while with this PR we look it up every time and it's 5-10% slower. If we increased that a lot (say, write to the same pin a dozen times) the difference would increase more and more with each additional API call that we weren't doing before.

The argument for making the change in this PR would be that this gain in speed isn't worth it because it is difficult for us to safely accommodate when the local cache gets out of sync, and then it's super confusing to the user.

github-actions[bot] commented 8 months ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.