hadley / adv-r

Advanced R: a book
http://adv-r.hadley.nz
Other
2.36k stars 1.71k forks source link

Section 2.3.6 Exercise 2 claims that tracemem will show two copies, but it only shows one on 4.1.1. #1701

Open Sean1708 opened 3 years ago

Sean1708 commented 3 years ago

The exercise currently says:

1.  Explain why `tracemem()` shows two copies when you run this code.
    Hint: carefully look at the difference between this code and the code 
    shown earlier in the section.

    ```{r, results = FALSE}
    x <- c(1L, 2L, 3L)
    tracemem(x)

    x[[3]] <- 4

However when I run exactly that code I only see one copy occurring:

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 [2] LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252 system code page: 65001

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.1.1

x <- c(1L, 2L, 3L) tracemem(x) [1] "<0000000017882FC0>" x[[3]] <- 4 tracemem[0x0000000017882fc0 -> 0x000000000c6b5118]:

I suspect that something has changed in R since this section was written, or maybe there is some Windows quirk. Either way, the section should probably be updated to reflect the fact that different version might not copy twice.

Unless I've misunderstood what tracemem actually does, in which case I think the explanation in that section should be updated. Right now it says:

From then on, whenever that object is copied, `tracemem()` will print a message telling you which object was copied, its new address, and the sequence of calls that led to the copy:

To me, this suggests that there would be a different line printed each time a copy occurs but maybe that's not the case?

berg-michael commented 3 years ago

I think your interpretation of tracemem is correct, though I'm not certain. Regardless, there are a few instances in that chapter where the number of copies observed does not line up with the text.

I'm guessing this is related to this change, from the R release notes (under the section for 4.0.0):

Reference counting is now used instead of the NAMED mechanism for determining when objects can be safely mutated in base C code. This reduces the need for copying in some cases and should allow further optimizations in the future. It should help make the internal code easier to maintain.

So the changes alluded to in footnote 15 have in fact occurred, as far as I can tell.

jxu commented 1 year ago

I see two copies on Windows R 4.2.3

> sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lobstr_1.1.2

loaded via a namespace (and not attached):
 [1] compiler_4.2.3  cli_3.6.1       tools_4.2.3     pillar_1.9.0    glue_1.6.2      rstudioapi_0.14
 [7] crayon_1.5.2    utf8_1.2.3      fansi_1.0.4     vctrs_0.6.1     lifecycle_1.0.3 rlang_1.1.0    
> x <- c(1L, 2L, 3L)
> tracemem(x)
[1] "<0000018898D9F7A8>"
> x[[3]] <- 4
tracemem[0x0000018898d9f7a8 -> 0x0000018898da2df8]: 
tracemem[0x0000018898da2df8 -> 0x000001889ad39bc8]: 
berg-michael commented 1 year ago

@jxu are you using RStudio? I get the two copies when using RStudio but not in a standard interactive session.

When exploring copy-on-modify behaviour interactively, be aware that you’ll get different results inside of RStudio. That’s because the environment pane must make a reference to each object in order to display information about it. This distorts your interactive exploration but doesn’t affect code inside of functions, and so doesn’t affect performance during data analysis. For experimentation, I recommend either running R directly from the terminal, or using RMarkdown (like this book).

jxu commented 1 year ago

Oh good catch; you're right I was using RStudio. In standard R I see one copy

> x <- c(1L, 2L, 3L)
> tracemem(x)
[1] "<000001B68F5953F8>"
> x[[3]] <- 4
tracemem[0x000001b68f5953f8 -> 0x000001b691395fe8]: