Consider syntax for assigning intermediate value to symbol

renkun-ken commented 10 years ago

It's a common demand that an intermediate result be assigned to a symbol in the current environment (often global environment) for further use. This clearly is one type of side effect that the current environment is changed.

Currently, there's no easy syntax that supports the assignment operation but manually call assign() like

mtcars %>>%
  subset(mpg <= mean(mpg)) %>>%
  (~ assign("x", ., envir = .GlobalEnv)) %>>%
  plot

The code works but it is only easy for global environment or some named environment. For local environment, it does not work with parent.frame().

Consider a syntax that derives from side-effect syntax that performs assignment operation like this.

renkun-ken commented 10 years ago

A draft syntax is like

x %>>% (~ symbol)
x %>>% (~ f(.) ~ symbol)
x %>>% (~ x ~ f(x) ~ symbol)

It can be best described by Start with ~ for side effect and end with a symbol for assignment.

An example is

mtcars %>>%
  subset(mpg <= mean(mpg)) %>>%
  (~ smtcars) %>>%
  (~ dim(.) ~ dim_mtcars) %>>%
  subset(select = c(mpg, wt, qsec)) %>>%
  lm(formula = mpg ~ .) %>>%
  summary %>>%
  (~ summ) %>>%
  (coefficients)

              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 17.0183914  5.1411954  3.310201 0.0047583237
wt          -2.9781345  0.6044032 -4.927397 0.0001823504
qsec         0.6033051  0.3053237  1.975953 0.0668509086

Inspect the environment after evaluating the code above.

> ls.str()
dim_mtcars :  int [1:2] 18 11
smtcars : 'data.frame': 18 obs. of  11 variables:
 $ mpg : num  18.7 18.1 14.3 19.2 17.8 16.4 17.3 15.2 10.4 10.4 ...
 $ cyl : num  8 6 8 6 6 8 8 8 8 8 ...
 $ disp: num  360 225 360 168 168 ...
 $ hp  : num  175 105 245 123 123 180 180 180 205 215 ...
 $ drat: num  3.15 2.76 3.21 3.92 3.92 3.07 3.07 3.07 2.93 3 ...
 $ wt  : num  3.44 3.46 3.57 3.44 3.44 ...
 $ qsec: num  17 20.2 15.8 18.3 18.9 ...
 $ vs  : num  0 1 0 1 1 0 0 0 0 0 ...
 $ am  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ gear: num  3 3 3 4 4 3 3 3 3 3 ...
 $ carb: num  2 1 4 4 4 3 3 3 4 4 ...
summ : List of 11
 $ call         : language lm(formula = mpg ~ ., data = .)
 $ terms        :Classes 'terms', 'formula' length 3 mpg ~ wt + qsec
 $ residuals    : Named num [1:18] 1.658 -0.813 -1.643 1.386 -0.376 ...
 $ coefficients : num [1:3, 1:4] 17.018 -2.978 0.603 5.141 0.604 ...
 $ aliased      : Named logi [1:3] FALSE FALSE FALSE
 $ sigma        : num 1.79
 $ df           : int [1:3] 3 15 3
 $ r.squared    : num 0.623
 $ adj.r.squared: num 0.573
 $ fstatistic   : Named num [1:3] 12.4 2 15
 $ cov.unscaled : num [1:3, 1:3] 8.217 -0.178 -0.437 -0.178 0.114 ...

renkun-ken commented 10 years ago

Given all syntax with (~ ...), operator ~ can be viewed in this context to be branching operator, which indicates that the following expression will be a side effect. It can either branch the left-hand side value to an expression (side-effect evaluation), or branch it to a symbol (assignment). After all, there's no point to evaluate a symbol for side effect (no side effect at all). Therefore this syntax seems not to create additional confusion or work at the expense of possible actions allowed in cases without this feature.

renkun-ken commented 10 years ago

Consider the = syntax suggested by @yanlinlin82. See https://github.com/renkun-ken/pipeR/issues/38.

renkun-ken commented 10 years ago

The following code adopts the = syntax.

mtcars %>>%
  subset(mpg <= mean(mpg)) %>>%
  (~ smtcars) %>>%   # side-effect assign
  (~ dim_mtcars = dim(.)) %>>%   # side-effect assign
  subset(select = c(mpg, wt, qsec)) %>>%
  lm(formula = mpg ~ .) %>>%
  (sum_lm = summary(.)) %>>%   # eval and assign
  (coefficients)

timelyportfolio commented 10 years ago

Definitely prefer this. I think this is much clearer, intuitive, and more readable.

renkun-ken commented 10 years ago

Think so too. Thanks @yanlinlin82 for the great suggestion. I'll implement it at branch feature/assign soon and see how it works.

renkun-ken commented 10 years ago

The latest commit at feature/assign uses symbolic call to perform the assignment, which allows the following usage:

> z <- list()
> 1:10 %>>% (~ z$a = length(.)) %>>% mean
[1] 5.5
> z
$a
[1] 10

That is, the assignment no longer calls assign() but builds a symbolic call to perform the assignment, which does not require the expression on lhs of = be a symbol and allows the usage like names(a) = ....

yanlinlin82 commented 10 years ago

That is more powerful!

renkun-ken commented 10 years ago

In v0.5, <- and -> will no longer be interpreted as lambda expression and are allowed to perform assignment in a pipeline, which makes the code even more readable in some cases.

renkun-ken / pipeR

Consider syntax for assigning intermediate value to symbol #37