renkun-ken / pipeR

Multi-Paradigm Pipeline Implementation
Other
167 stars 39 forks source link

Add support of question mark as binary operator #62

Closed renkun-ken closed 9 years ago

renkun-ken commented 9 years ago

The current implementation only supports interpreting question mark as unary operator. For example,

mtcars %>>%
  (? head(., 3)) %>>%
  lm(formula = mpg ~ wt + cyl) %>>%
  summary %>>%
  coef
? head(., 3)
               mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 39.686261  1.7149840 23.140893 3.043182e-20
wt          -3.190972  0.7569065 -4.215808 2.220200e-04
cyl         -1.507795  0.4146883 -3.635972 1.064282e-03

Add support of question mark as binary operator so that user can customize the text in the question.

mtcars %>>%
  ("Sample data" ? head(., 3)) %>>%
  lm(formula = mpg ~ wt + cyl) %>>%
  summary %>>%
  coef
? Sample data 
               mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 39.686261  1.7149840 23.140893 3.043182e-20
wt          -3.190972  0.7569065 -4.215808 2.220200e-04
cyl         -1.507795  0.4146883 -3.635972 1.064282e-03

The question can be written like a mutated string.

yanlinlin82 commented 9 years ago

It seems a little redundant with cat() string as:

mtcars %>>%
  "Sample data" %>>%
  (? head(., 3))

By the way, will it be possible to support it as ternary operator?

mtcars %>>%
  (mpg >= 20 ? "faster" : "slower")
renkun-ken commented 9 years ago

@yanlinlin82 I thought about supporting the commonly used bool ? a : b pattern but I still need to think about it and see if it brings unintentional problems.

renkun-ken commented 9 years ago

The main consideration to support ? as binary operator is that in a long complicated pipeline, the question expression can be ambiguous if user cannot mark the expression with string. For example,

pipeline({
  mtcars
  ? nrow(.)
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95))
  ? nrow(.)
  lm(formula = mpg ~ wt + cyl)
  summary
  ? .$r.squared
  coef
})
? nrow(.)
[1] 32
? nrow(.)
[1] 28
? .$r.squared
[1] 0.8252262
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 36.630834  1.6127431 22.713372 3.299463e-18
wt          -2.528175  0.7657771 -3.301450 2.894825e-03
cyl         -1.418216  0.3533452 -4.013684 4.783302e-04

The output answers the same question in different context but user may not well distinguish the answers clearly. So a customizable label of question can be useful here.

pipeline({
  mtcars
  "Total number of records" ? nrow(.)
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95))
  "Qualified number of records" ? nrow(.)
  lm(formula = mpg ~ wt + cyl)
  summary
  "R Squared" ? .$r.squared
  coef
})
? Total number of records 
[1] 32
? Qualified number of records 
[1] 28
? R Squared 
[1] 0.8252262
             Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 36.630834  1.6127431 22.713372 3.299463e-18
wt          -2.528175  0.7657771 -3.301450 2.894825e-03
cyl         -1.418216  0.3533452 -4.013684 4.783302e-04

The output is clearer in its own.

renkun-ken commented 9 years ago

Whether to support question mark as ternary operator deserves another issue. I'll open one for that.

abresler commented 9 years ago

Love this idea

Alex Bresler

ASBC LLLC asbcllc.com 917-455-0239

abresler@asbcllc.com @abresler On Nov 16, 2014 8:40 PM, "Kun Ren" notifications@github.com wrote:

The main consideration to support ? as binary operator is that in a long complicated pipeline, the question expression can be ambiguity if user cannot mark the expression with string. For example,

pipeline({ mtcars ? nrow(.) subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) ? nrow(.) lm(formula = mpg ~ wt + cyl) summary ? .$r.squared coef })

? nrow(.) [1] 32 ? nrow(.) [1] 28 ? .$r.squared [1] 0.8252262 Estimate Std. Error t value Pr(>|t|) (Intercept) 36.630834 1.6127431 22.713372 3.299463e-18 wt -2.528175 0.7657771 -3.301450 2.894825e-03 cyl -1.418216 0.3533452 -4.013684 4.783302e-04

The output answers the same question in different context but user may not well distinguish the answers clearly. So a customizable label of question can be useful here.

pipeline({ mtcars "Total number of records" ? nrow(.) subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) "Qualified number of records" ? nrow(.) lm(formula = mpg ~ wt + cyl) summary "R Squared" ? .$r.squared coef })

? Total number of records [1] 32 ? Qualified number of records [1] 28 ? R Squared [1] 0.8252262 Estimate Std. Error t value Pr(>|t|) (Intercept) 36.630834 1.6127431 22.713372 3.299463e-18 wt -2.528175 0.7657771 -3.301450 2.894825e-03 cyl -1.418216 0.3533452 -4.013684 4.783302e-04

The output is clearer in its own.

— Reply to this email directly or view it on GitHub https://github.com/renkun-ken/pipeR/issues/62#issuecomment-63250741.