hadley / adv-r

Advanced R: a book
http://adv-r.hadley.nz
Other
2.35k stars 1.71k forks source link

Update example in Chapter 2.5.1 #1776

Open MatthiasLiew opened 1 year ago

MatthiasLiew commented 1 year ago

When running the example in 2.5.1 that demonstrates the excessive copies that R makes when modifying a data frame in a loop, I noticed that I could not replicate the results on my local machine. I kept getting two duplications per loop instead of three. A search on stackoverflow ratified my local results and I realised that the example in the book was for pre-R v4.0.0 while I was on R v4.2.2. However, I still could not make heads or tails with respect to the explanation of said results.

Section 3.4.4 of the R Language definition provided a clue with the *tmp* variable which is discussed in Chapter 6.8.4 of this book as well. However, upon rewriting and executing the example with an explicit assignment of the *tmp* variable and calling the [[<- function directly, only one copy per loop was made. Upon inspecting the source code of R, I believe I stumbled across the answer when looking at the functions applydefine and SET_TEMPVARLOC_FROM_CAR. It appears that the *tmp* variable is assigned by the internal C code and in this case, a duplicate of the variable to be assigned later (the data frame x) is made.

The second copy is made after the internal generic function [[<- dispatches to [[<-.data.frame which is a normal closure. The book states this accurately.

The changes proposed are of my own research and any mistakes are my own. Please feel free to correct me if I have made any mistakes.

I assign the copyright of this contribution to Hadley Wickham.