renkun-ken / pipeR

Multi-Paradigm Pipeline Implementation
Other
169 stars 39 forks source link

pipeR does not work well with qplot() #18

Closed yanlinlin82 closed 10 years ago

yanlinlin82 commented 10 years ago

Here goes the example of problem:

library(ggplot2) mtcars %>>% qplot(mpg, wt, data = .) Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous Error: Aesthetics must either be length one, or the same length as the dataProblems:.

However, "%>%" in "magrittr" works fine:

library(magrittr) mtcars %>% qplot(mpg, wt, data = .)

renkun-ken commented 10 years ago

The problem is that pipeR has different rules of piping. Please note that your original code without piping is

qplot(mpg, wt, data = mtcars)

In this code, you don't actually pipe mtcars to the first-argument of qplot().

%>>% implements a set of rules to determine where the object should be piped to.

The rules are best described by the following demos:

x %>>% f            # f(x)
x %>>% f(...)       # f(x,...)
x %>>% { f(.) }     # f(x)
x %>>% ( f(.) )     # f(x)
x %>>% (i -> f(i))  # f(x)
x %>>% (i ~ f(i))   # f(x)

You need to be aware that if you supply a function name or call, the object will be piped to its first argument. If it is not what you want, you can enclose your function call or expression with () or {} so that . represents the piped object.

Therefore, the right code should be

mtcars %>>% ( qplot(mpg, wt, data = .) )

magrittr's %>% analyzes your expression heuristically and detect . symbols to determine whether it is first-argument piping or only piping to .. I'm not sure if I like this kind of heuristic design because it can be sometimes ambiguous and unpredictable. I always believe that heuristic design is good for interactive analysis for saving a lot time, but may not be the best solution for programming because it may add extra unpredictability which is undesired for coding.

Suppose we define the following function f(x,y):

library(magrittr)
library(pipeR)

f <- function(x,y) {
  cat("x = ", x, "\n")
  cat("y = ", y, "\n")
}

Then we use magrittr to pipe:

> # only pipe to .
> 1:10 %>% f(.) 
x =  1 2 3 4 5 6 7 8 9 10 
Error in cat("y = ", y, "\n") : argument "y" is missing, with no default
> 
> # pipe to first argument
> 1:10 %>% f(1) 
x =  1 2 3 4 5 6 7 8 9 10 
y =  1 
> 
> # only pipe to .
> 1:10 %>% f(.,.) 
x =  1 2 3 4 5 6 7 8 9 10 
y =  1 2 3 4 5 6 7 8 9 10 
> 
> # pipe to first argument and .
> 1:10 %>% f(length(.)) 
x =  1 2 3 4 5 6 7 8 9 10 
y =  10 
> 
> # try to pipe to first-argument but does not work
> 1:10 %>% f(length(.),length(.)) 
Error in f(`1:10`, length(.), length(.)) : unused argument (length(.))
> 
> # try to pipe to first-argument but does not work
> 1:10 %>% f(c(.,1),c(1,.))
Error in f(`1:10`, c(., 1), c(1, .)) : unused argument (c(1, .))

It looks like %>% can be sometimes smart but I don't feel that I have power to control its behavior because if you are not fully aware of its heuristic rules to determine whether to pipe to first argument or just pipe to . its behavior won't be so predictable.

That's why %>>% is designed to behave based on a set of very intuitive rules so that when you write your code, you know what and how the object will be piped.

That's how it works and you fully know what is going to happen.

renkun-ken commented 10 years ago

Another example may expose the problem more fully.

Consider the following function:

f <- function(x,y,z = "z") {
  cat("x = ", x, "\n")
  cat("y = ", y, "\n")
  cat("z = ", z, "\n")
}

If I run the following code:

> 1:10 %>% f(.,2)

Do you have any confidence to guess what is going to happen? Will 1:10 be piped to the first argument as if f(1:10, 1:10, 2); or only pipe to . as if f(1:10,2)?

The answer is only pipe to .:

> 1:10 %>% f(.,2)
x =  1 2 3 4 5 6 7 8 9 10 
y =  2 
z =  hello 

as if f(1:10,2).

But see what happens here:

> 1:10 %>% f(1,length(.))
x =  1 2 3 4 5 6 7 8 9 10 
y =  1 
z =  10 

It pipes to the first argument as if f(1:10,1,length(1:10)).

More examples:

> 1:10 %>% f(1,(.))
x =  1 2 3 4 5 6 7 8 9 10 
y =  1 
z =  1 2 3 4 5 6 7 8 9 10 
> 1:10 %>% f(1,.)
x =  1 
y =  1 2 3 4 5 6 7 8 9 10 
z =  hello 

How confident and responsive are you if I give you more examples? Does it cost more efforts to read or write code and be sure what's going to happen?

What if you really want only pipe to . as if f(mean(1:10),length(1:10))? but

> 1:10 %>% f(mean(.),length(.))
x =  1 2 3 4 5 6 7 8 9 10 
y =  5.5 
z =  10 

All the problems above does not exist in pipeR's %>>% because you have the full control of how it works.

yanlinlin82 commented 10 years ago

Impressive! I have finally seen the fatal ambiguity of "magrittr" package and how "pipeR" solves it. I guess it would be better to add a little bit more in "Usage" section of "%>>%" to explain the difference of "x %>>% f(...)" and "x %>>% ( f(.) )", which should be an important key feature to clarify earlier.

renkun-ken commented 10 years ago

Yes, I would do something about it. Thanks. :)

smbache commented 10 years ago

Fair enough to have an opinion; we'll keep it in mind. I must admit i see this more as negative propaganda than constructive criticism. Maybe it's just your journalistic style.

renkun-ken commented 10 years ago

If too much ambiguous code is produced, it would be fatal disaster. Let people know such potential exists is exactly the constructive criticism.

yanlinlin82 commented 10 years ago

As a package user but not a package developer, I may be more neutral to have my opinion. I think @renkun-ken has pointed out a potential confusion and provided a solution in his package. It would be fine for him to keep such problem be hidden in "magrittr" package until last minute by not posting anything in "magrittr" issue list. Since it should not be hard for "magrittr" to find another solution, I do not think his posting is a negative propaganda as you said.

renkun-ken commented 10 years ago

Thanks @yanlinlin82. Let's keep cool and find what is best for the community.