Creating a more robust version of omnitest()

ncarchedi commented 10 years ago

This is a challenge to create more robust version of omnitest(). The goal is to standardize the user's expression AND the correct answer, then compare the two.

In my opinion, the expressions should be standardized in the following ways:

1) Use base::match.call() to specify all arguments by their full names. As the following example shows, this will help with missing argument names, ordering of arguments, and partial matching.

> match.call(sample, call('sample', s = 5, 1:10))
sample(x = 1:10, size = 5)
> str(sample)
function (x, size, replace = FALSE, prob = NULL)

For an example of implementation, I've created a match_call() custom test for use in many of the newer R Programming and Getting and Cleaning Data lessons.

2) Use formals() to tack on any unspecified arguments with their default values. This will help in a situation where the user specifies a default value for an argument that was omitted from the correct answer. @reginaastri first brought up this issue here.

> formals(sample)
$x

$size

$replace
[1] FALSE

$prob
NULL

3) Optionally, replace = with <- for assignment per this discussion. We should probably make use of formatR:::replace_assignment() as a model for this purpose. Not all instructors will want to allow = for assignment, so perhaps this new omnitest() could have an argument like replace_assign with a default value of getOption("swirl_replace_assign"). Instructors could then set options(swirl_replace_assign) = TRUE in initLesson.R if they want to allow = for assignment.

I may have missed something, so please chime in on this.

WilCrofter commented 10 years ago

Assuming the new version's default configuration would not require changes to any courses which use current omnitest, should the new version's default configuration:

have exactly the same pass/fail properties as the current?
have similar pass/fail properties but pass if an answer differs in some insignificant way from correct?
behave as in 1 or 2 based on a flag?
have more fine-grained options?

It may be worth noting that R Programming (and R Programming Alt) override omnitest with a custom version in 2 lessons: Vectors; and Matrices and Data Frames.

ncarchedi commented 10 years ago

My initial thought was that the new omnitest should be at least as liberal in every case as the current version is. In other words, it should accept every answer that the current version would, plus additional answers that cause the current version to "fail" due to argument permutation, partial matching, unnecessarily specifying arguments with their default values, etc. If we can achieve that, then it should be perfectly backward compatible.

This is the part where you tell me I'm dreaming....

WilCrofter commented 10 years ago

I've set up a new repo for early exploration, thinking it would be easier to test things non-interactively at first. I've made everyone collaborators, but I'd favor keeping the conversation here.

ncarchedi commented 10 years ago

Great, thank you so much.

WilCrofter commented 10 years ago

Recursive version match.calls which fills in named arguments with defaults and provides partial coverage of S3 methods is available, along with some light unit testing.

There's also an interactive demo to illustrate how callbacks process entered expressions.

ncarchedi commented 10 years ago

@WilCrofter This is seriously brilliant. I hadn't thought about implementing a recursive version of base::match.call. I pushed a few small changes so that the package passes R CMD check (old habits die hard). A few questions:

Can you think of any cases that might cause it to fail?
Can you think of a scenario where our existing omnitest would deem a user's answer to be correct, but where this new version would deem it incorrect? (As long as this new version is as or more "liberal" than the old, we avoid backward-compatibility issues.)
Does it make sense to bundle all answer tests together in a separate R package on which swirl depends? We could use a version requirement (e.g. swirl_tests >= 0.1.1) in swirl's DESCRIPTION file to make sure everyone stays current.

WilCrofter commented 10 years ago

@ncarchedi, thanks for the R CMD check tweaks!

Regarding the questions, I'm really very uncertain about what makes sense. Early days and all. That said:

There are probably lots. I intend to put examples in the unit tests. One example would be an S3 method whose first argument is a call which, if evaluated, would compute an object for which there is a non-default method, e.g., print(read.csv("somefile.csv")). In that case it would assume the default method incorrectly. It would not standardize order of arguments in the case of the commutative binary operators, * and +. I have no idea about S4 methods as yet.
I'm fairly confident that any two expressions omnitest considers equivalent would have identical rmatch_calls output.
Would we have anyone willing to maintain a second CRAN package?

ncarchedi commented 10 years ago

I'm happy to maintain the third package, since I'm assuming it would be fairly low maintenance and it's likely that no one else will be using it as a dependency. I'm just thinking about your (borrowed) mantra: "small pieces, loosely joined". Not saying it's the right thing to do, I'm just curious what you think.

WilCrofter commented 10 years ago

I wouldn't object, as long I wouldn't have to maintain it, but some high-level design might be in order first. For example, and without saying this is a good idea either, if courses were packages with swirl as a dependency, then courses could have other test packages as dependencies as well.

ncarchedi commented 10 years ago

if courses were packages with swirl as a dependency, then courses could have other test packages as dependencies as well.

Hmm, interesting idea. Sometimes I do wish we could make better use of the existing R/CRAN infrastructure (packages for content, DESCRIPTION files for course/lesson metadata, etc.) Let's keep this discussion going.

WilCrofter commented 10 years ago

Just pushed version of rmatch_calls which raises errors in cases it is likely to screw up. Still a work in progress.

WilCrofter commented 10 years ago

rmatch_calls now raises errors in situations it can't handle. Its limitations will be apparent from the errors it raised in R Programming when applied to the correct answer of every command question. (See below.) Results are similar for Regression Models, but R Programming is generally more familiar hence more illustrative.

The bottom line, as I see it, is that abstract syntax trees involve a lot of special cases requiring information which is not available in the syntax tree per se. For instance print(paste("a = ", 1)) will raise an error because print is an S3 method and paste("a = ", 1) (as an expression) is a call. But the proper S3 method would be based on the output of paste, which we know will be a character vector but a program crawling a syntax tree does not. Matching a call on the wrong class, or default, is tempting in this case but could in fact mislead.

S4 and reference methods, though rare, are two more cans of worms which rmatch_calls doesn't currently handle.

That said, a more robust omnitest could catch rmatch_calls errors and default to its usual thing when they occur. The result would probably be modestly more permissive than the current version and I can't think of a situation it would screw up.

At this point, I think some strategic discussion is in order. (I.e., I don't know what to do next.) I'll clean up the code which generated the examples below and push it to the omnitest repo for general perusal and comment. That should be done by dinner time Labor Day. Cheers.

R Programming :

    Basic_Building_Blocks: (None)

    Dates_and_Times:

         Illegal expression, as.POSIXlt(Sys.time()): The first argument, Sys.time(), to S3 method 'as.POSIXlt', is a call, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, str(unclass(t2)): The first argument, unclass(t2), to S3 method 'str', is a call, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, weekdays(d1): The first argument, d1, to S3 method 'weekdays', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, months(t1): The first argument, t1, to S3 method 'months', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, quarters(t2): The first argument, t2, to S3 method 'quarters', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

    lapply_and_sapply:

         Illegal expression, head(flags): The first argument, flags, to S3 method 'head', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, head(flag_colors): The first argument, flag_colors, to S3 method 'head', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, unique(c(3, 4, 5, 5, 5, 6, 6)): The first argument, c(3, 4, 5, 5, 5, 6, 6), to S3 method 'unique', is a call, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

    Looking_at_Data:

         Illegal expression, head(plants): The first argument, plants, to S3 method 'head', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, head(plants, 10): The first argument, plants, to S3 method 'head', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, tail(plants, 15): The first argument, plants, to S3 method 'tail', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

         Illegal expression, summary(plants): summary is a standardGeneric.

         Illegal expression, str(plants): The first argument, plants, to S3 method 'str', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

    Matrices_and_Data_Frames: (None)

    Missing_Values: (None)

    Sequences_of_Numbers:

         Illegal expression, seq(along = my_seq): The first argument, my_seq, to S3 method 'seq', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

    Simulation:

         Illegal expression, hist(cm): The first argument, cm, to S3 method 'hist', is a name, which (as an expression) is not atomic, hence its class can't be determined in an abstract syntax tree without additional information.

    Subsetting_Vectors: (None)

    vapply_and_tapply:

         Illegal expression ok(): ok is not a function.

    Vectors: (None)

ncarchedi commented 10 years ago

Thanks for working on this, Bill. I'm in thesis mode for the next few days, but hope to contribute some meaningful feedback once I'm out of the weeds.

WilCrofter commented 10 years ago

Most of rmatch_calls "Illegal expression" complaints, above, are because the class of the first argument of an S3 method can't be determined without evaluation. Simply allowing evaluation of such arguments would eliminate these complaints. I've intended to write in an evaluation option for quite a few days, and thought it was time to at least mention it. I should get to it fairly soon, but so far life has not cooperated.

WilCrofter commented 10 years ago

I've added an eval_for_class argument to rmatch_calls as per above remark. Its default value, FALSE, will cause it to behave as before. If TRUE, it will evaluate the first arguments of S3 methods as it encounters them, in order to determine their proper class, which generally enables it to find the actual function to use with match.calls(), e.g., as.POSIXlt.POSIXct for expression as.POSIX(Sys.time()).

Errors in the listing below are just due to variables which aren't defined. This wouldn't happen in a swirl context, and could be easily fixed in the present context just by loading appropriate libraries, sourcing init files, and evaluating expressions as they appear.

TODO: Fix undefined variables. Document. Incorporate in swirl branch.

Compare the following with the listing above.

R Programming :

    Basic_Building_Blocks: (None)

    Dates_and_Times:

         Error in eval(expr, envir, enclos) : object 't2' not found
         Error in eval(expr, envir, enclos) : object 'd1' not found
         Error in eval(expr, envir, enclos) : object 't1' not found
         Error in eval(expr, envir, enclos) : object 't2' not found
    lapply_and_sapply:

         Error in eval(expr, envir, enclos) : object 'flags' not found
         Error in eval(expr, envir, enclos) : object 'flag_colors' not found
    Looking_at_Data:

         Error in eval(expr, envir, enclos) : object 'plants' not found
         Error in eval(expr, envir, enclos) : object 'plants' not found
         Error in eval(expr, envir, enclos) : object 'plants' not found
         Error in eval(expr, envir, enclos) : object 'plants' not found
         Error in eval(expr, envir, enclos) : object 'plants' not found
    Matrices_and_Data_Frames: (None)

    Missing_Values: (None)

    Sequences_of_Numbers:

         Error in eval(expr, envir, enclos) : object 'my_seq' not found
    Simulation: (None)

    Subsetting_Vectors: (None)

    vapply_and_tapply:

         Illegal expression ok(): ok is not a function.

    Vectors: (None)

WilCrofter commented 10 years ago

rmatch_calls with eval_for_class=TRUE processes R Programming without error. Below is an example of its expansions of function and S3 method calls for Dates and Times.

At the cost of multiple evaluations of the same code, this should alleviate many common problems if integrated with omnitest. It will not help in cases of S4 methods or reference class methods, nor will it help with expressions such as 5+7 vs 7+5.

    Dates_and_Times:

         d1 <- Sys.Date()
         class(d1)
         unclass(d1)
         d1
         d2 <- as.Date(x = "1969-01-01")
         unclass(d2)
         t1 <- Sys.time()
         t1
         class(t1)
         unclass(t1)
         t2 <- as.POSIXlt(x = Sys.time())
         class(t2)
         t2
         unclass(t2)
         str(object = unclass(t2), max.level = NA, vec.len = strO$vec.len, digits.d = strO$digits.d, nchar.max = 128, give.attr = TRUE, give.head = TRUE, give.length = give.head, width = getOption("width"), nest.lev = 0, indent.str = paste(rep.int(" ", max(0, nest.lev + 1)), collapse = ".."), comp.str = "$ ", no.list = FALSE, envir = baseenv(), strict.width = strO$strict.width, formatNum = strO$formatNum, list.len = 99)
         t2$min
         weekdays(x = d1, abbreviate = FALSE)
         months(x = t1, abbreviate = FALSE)
         quarters(x = t2)
         t3 <- "October 17, 1986 08:24"
         t4 <- strptime(x = t3, format = "%B %d, %Y %H:%M")
         t4
         class(t4)
         Sys.time() > t1
         Sys.time() - t1
         difftime(time1 = Sys.time(), time2 = t1, units = "days")

swirldev / swirl

Creating a more robust version of omnitest() #196