When running the example in 2.5.1 that demonstrates the excessive copies that R makes when modifying a data frame in a loop, I noticed that I could not replicate the results on my local machine. I kept getting two duplications per loop instead of three. A search on stackoverflow ratified my local results and I realised that the example in the book was for pre-R v4.0.0 while I was on R v4.2.2. However, I still could not make heads or tails with respect to the explanation of said results.
Section 3.4.4 of the R Language definition provided a clue with the *tmp* variable which is discussed in Chapter 6.8.4 of this book as well. However, upon rewriting and executing the example with an explicit assignment of the *tmp* variable and calling the [[<- function directly, only one copy per loop was made. Upon inspecting the source code of R, I believe I stumbled across the answer when looking at the functions applydefine and SET_TEMPVARLOC_FROM_CAR. It appears that the *tmp* variable is assigned by the internal C code and in this case, a duplicate of the variable to be assigned later (the data frame x) is made.
The second copy is made after the internal generic function [[<- dispatches to [[<-.data.frame which is a normal closure. The book states this accurately.
The changes proposed are of my own research and any mistakes are my own. Please feel free to correct me if I have made any mistakes.
I assign the copyright of this contribution to Hadley Wickham.
When running the example in 2.5.1 that demonstrates the excessive copies that R makes when modifying a data frame in a loop, I noticed that I could not replicate the results on my local machine. I kept getting two duplications per loop instead of three. A search on stackoverflow ratified my local results and I realised that the example in the book was for pre-R v4.0.0 while I was on R v4.2.2. However, I still could not make heads or tails with respect to the explanation of said results.
Section 3.4.4 of the R Language definition provided a clue with the
*tmp*
variable which is discussed in Chapter 6.8.4 of this book as well. However, upon rewriting and executing the example with an explicit assignment of the*tmp*
variable and calling the[[<-
function directly, only one copy per loop was made. Upon inspecting the source code of R, I believe I stumbled across the answer when looking at the functions applydefine and SET_TEMPVARLOC_FROM_CAR. It appears that the*tmp*
variable is assigned by the internal C code and in this case, a duplicate of the variable to be assigned later (the data framex
) is made.The second copy is made after the internal generic function
[[<-
dispatches to[[<-.data.frame
which is a normal closure. The book states this accurately.The changes proposed are of my own research and any mistakes are my own. Please feel free to correct me if I have made any mistakes.
I assign the copyright of this contribution to Hadley Wickham.