hadley / adv-r

Advanced R: a book
http://adv-r.hadley.nz
Other
2.36k stars 1.71k forks source link

2.3 modify on copy (rules of copy seem to have changed) #1753

Open alcor2019 opened 1 year ago

alcor2019 commented 1 year ago

Hi, it seems that the rules for copy-on-modify had changed. Every time y is modified, y is copied to a new address like you see just below (cf. section 2.31 tracemem() of the book)

sessionInfo() R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=French_France.utf8 LC_CTYPE=French_France.utf8
[3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C
[5] LC_TIME=French_France.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.2.2 tools_4.2.2

x <- c(1, 2, 3) cat(tracemem(x), "\n")

<000001EE5EB32C58> y <- x **> cat(tracemem(y), "\n") <000001EE5EB32C58> y[[3]] <- 4L tracemem[0x000001ee5eb32c58 -> 0x000001ee6078a568]: y[[3]] <- 5L tracemem[0x000001ee6078a568 -> 0x000001ee6078e178]: y[[3]] <- 4L tracemem[0x000001ee6078e178 -> 0x000001ee6077fe38]:**
jxu commented 1 year ago

I get the same results just modifying x. This doesn't match up with section 2.5 modify in-place which says v should bind to the same object.

> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lobstr_1.1.2

loaded via a namespace (and not attached):
 [1] compiler_4.2.3  cli_3.6.1       tools_4.2.3     pillar_1.9.0    glue_1.6.2      rstudioapi_0.14
 [7] crayon_1.5.2    utf8_1.2.3      fansi_1.0.4     vctrs_0.6.1     lifecycle_1.0.3 rlang_1.1.0 
> v <- c(1, 2, 3)
> obj_addr(v)
[1] "0x1889ad7ffb8"
> v[[3]] <- 4
> obj_addr(v)
[1] "0x1889ad6dcf8"
HyacinthMeng commented 1 year ago

image

This part content may be not properl : https://advanced-r-solutions.rbind.io/names-and-values.html#modify-in-place

image

jean-baka commented 1 year ago

Hi everyone, section 2.3 actually gives much information, but perhaps it doesn't put enough emphasis on the fact that the copy-on-replace behaviour is tightly linked to the number of references to the object.

To troubleshoot the above examples: first and foremost, please run "bare R", not through RStudio, which actually adds references to objects for the purpose of the GUI. See http://adv-r.had.co.nz/memory.html#modification from the 1st edition to read more about that. For me (R version 4.3.1 on my x86_64-pc-linux-gnu (64-bit) running Debian GNU/Linux), the following example (same as @jxu and @HyacinthMeng above) works well, without copy:

v <- c(1,2,3) refs(v) [1] 1 address(v) [1] "0x55fbab440118" v[[3]] <- 4 address(v) [1] "0x55fbab440118"

Another interesting thing I discovered is that apparently, when we run something like v <- 1:3, we create some sort of a promise, and not the "actual", "final" object, contrary to what happens when we create the vector as v <- c(1L, 2L, 3L):

v <- 1:3 refs(v) [1] 65535 v <- c(1L, 2L, 3L) refs(v) [1] 1

This entails the interesting behaviour that when using the shorthand "1:3", the very first replacement seems to make a copy, while the subsequent ones (even when extending the object, provided memory management allows for enough room at that particular place) do not create a copy:

v <- 1:3 c(address(v), refs(v)) [1] "0x55fbabb3dfe0" "65535" v[2] <- 5L c(address(v), refs(v)) [1] "0x55fbab43e2e8" "1"
v[1] <- 5L c(address(v), refs(v)) [1] "0x55fbab43e2e8" "1"

But sometimes, when you extend the vector, the memory management will go find enough room somewhere else:

v[5] <- 10L c(address(v), refs(v)) [1] "0x55fbab440c38" "1"

Perhaps @hadley could add something on this stuff in the book, and close this issue? I also read that the way R deals with references to objects in undergoing some work, so perhaps what we say here may be outdated soon...?

hadley commented 1 year ago

You mean like this? 😄

When exploring copy-on-modify behaviour interactively, be aware that you’ll get different results inside of RStudio. That’s because the environment pane must make a reference to each object in order to display information about it. This distorts your interactive exploration but doesn’t affect code inside of functions, and so doesn’t affect performance during data analysis. For experimentation, I recommend either running R directly from the terminal, or using RMarkdown (like this book).

jean-baka commented 1 year ago

Yes, thank you Hadley, I had read that, but perhaps the OP didn't... ;)

Still, IMHO there is something to explain about that number of references seemingly equal to 2^16 - 1 for a "promise"(?) like v <- 1:3, versus a neat refs(v) == 1 after v <- c(1L, 2L, 3L)...

hadley commented 1 year ago

@jean-baka there's a good reason that the second edition doesn't use refs(), which seems clearly buggy here. If you search for "ALTREP" on https://adv-r.hadley.nz/names-values.html#copy-on-modify, you can see why : is a bit different.