swirldev / swirl

:cyclone: Learn R, in R.
http://swirlstats.com
Other
1.13k stars 593 forks source link

Same expressions, evaluated in different contexts, can give different answers. #39

Closed WilCrofter closed 10 years ago

WilCrofter commented 10 years ago

This issue includes #7, #24, and #35, and hopefully explains them better.

The main problem involves evaluating the same expression in two different contexts and expecting them to have the same result. They won't, necessarily.

Examples:

Details. Using the debugger on Test Modules, module 1:

NOTE: Earlier draft of this post was in error. As a rule, R uses lexical scoping. "Lexical scoping looks up symbol values based on how functions were nested when they were created, not how they are nested when they are called."--Adv R Programming. I should have used parent.env() rather than parent.frame() earlier.

| First, create a vector of numbers using the c() function and store that vector in | a variable of your choice.

x <- c(3, 5, 7, 11, 13)

Called from: eval(expr, envir, enclos)

Browse[1]> n

debug at /home/william/dev/r/swirlfancy/R/answerTests.R#78: eval(e$expr)

Browse[2]> ls()

[1] "e" "keyphrase"

Browse[2]> exists("x", environment(), inherits=FALSE) [1] FALSE

Browse[2]> exists("x", environment(), inherits=TRUE) [1] TRUE

Browse[2]> e0 <- environment()

Browse[2]> (e1 <- parent.env(e0))

Browse[2]> exists("x", e1, inherits=FALSE) [1] FALSE Browse[2]> (e2 <- parent.env(e1)) attr(,"name") [1] "imports:swirlfancy" Browse[2]> exists("x", e2, inherits=FALSE) [1] FALSE Browse[2]> (e3 <- parent.env(e2)) Browse[2]> exists("x", e3, inherits=FALSE) [1] FALSE Browse[2]> (e4 <- parent.env(e3)) Browse[2]> exists("x", e4, inherits=FALSE) [1] TRUE Browse[2]> x [1] 3 5 7 11 13
WilCrofter commented 10 years ago

A strategy based on "snapshots" of the global environment seems to solve all of these issues (with one exotic caveat discussed below.) This strategy elaborates an idea implicit or at least intimated in Hadley's frndly.R code and commentary. It is implemented in a branch, ftr.snapshots, for post-release consideration.

We define a "snapshot" of the global environment as

ge <- as.list(globalenv())

Environments are subject to reference semantics, i.e., all references refer to the same copy. Hence, the state of an environment cannot be saved for later comparison merely by creating a second reference. Any change in the environment will affect all references. Lists, however, are subject to copy-on-modify semantics. The snapshot above is a list containing a copy of each object in the global environment. A subsequent change in the global environment will not cause a change in the list (with one exotic caveat.)

Thus, comparing snapshots just before and just after a user's response in the R console captures any variables created or changed due to that response. If and when the response is verified as correct, the new variable names and values are incorporated in a stored list of swirl's "official" history. The official list, in turn, is used to restore a correct state of the global environment after a user has returned from play, or upon resumption of an incomplete module.

Snapshots are also used to capture changes in the global environment due to module initialization. Any variables thus created or changed are incorporated in the official history.

This strategy averts the problems of evaluating expressions at different times or in different contexts. All items in official history reflect the most recent changes in the global environment due to actions within swirl.

The exotic exception, of course, would be the case of a module involving creation and manipulation of environments. If env1 were created in the global environment, a snapshot as defined above would contain only a reference to env1. If, say, env1$x were subsequently changed, the snapshot's reference would be affected by that change. The problem can be overcome, but entails what seems like overkill at the moment.

ncarchedi commented 10 years ago

BRILLIANT!!! We should hop on this soon after release. You and Gina have a knack for elegant solutions to hard problems....

WilCrofter commented 10 years ago

As became clear during implementation and testing, the snapshot strategy did not cover all bases. Consequently we had to write a function which evaluated an expression in a new environment whose parent was globalenv(). This affected two answer tests and was used with snapshots to cover the case in which an expression entered by the user neither creates a new variable nor changes the value of a variable created earlier.

However, a branch which seems to work is available at reginaastri and WilCrofter (ftr.snapshots.) We will not push it (and certainly not merge it) to swirldev just yet.

WilCrofter commented 10 years ago

In order to check answers to certain kinds of questions, the need to simulate a user's response seems unavoidable. This is problematic because, in responding, the user has normally changed the global environment. Thus, snapshots are necessary.

They are not sufficient. If a user is asked to create a variable using c, and enters x <- c(1, 2, 3) it may be the case that x already exists and has the value c(1, 2, 3). In that case the environment will not change, and comparison of two snapshots will not detect that the user has answered correctly.

The only apparent means of detection would be to evaluate the user's expression in a new, "clean" environment whose parent is the global environment. The new variable would show up in the clean environment, since the assignment takes place there.

However, in other circumstances, the user's response will have changed the global environment. It would be best, then, to evaluate in a clean environment whose parent is equivalent to the previous global environment. Given a snapshot (e$snapshot, a list) of the previous environment, this is pretty easy to do:

pe <- as.environment(e$snapshot) parent.env(pe) <- parent.env(globalenv()) newCleanEnv <- new.env(parent=pe)

WilCrofter commented 10 years ago

See branch "snapshots" at swirldev/swirl and associated post at swirl-coders.