Closed kohske closed 10 years ago
And here is another bad scenario. If anyone has .
in global env and try to use it inside pipe, the behavior would be unexpected.
> . <- "greater"
> 1:3 %>>% t.test(alternative = .)
Error in match.arg(alternative) :
'arg' must be NULL or a character vector
> .. <- "greater"
> 1:3 %>>% t.test(alternative = ..)
One Sample t-test
data: .
t = 3.4641, df = 2, p-value = 0.03709
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
0.3141455 Inf
sample estimates:
mean of x
2
One possible way to avoid this is using very very rare name (like GUID) instead of .
in pipe_first
:
eval(as.call(c(fun[1L],quote(tcwMCyDiaABizOhvCSpjHITWQRRgKafCmpeIbBvSXV),fun[-1L])),
envir=list(tcwMCyDiaABizOhvCSpjHITWQRRgKafCmpeIbBvSXV=x),enclos = envir)
Thanks, @kohske! It is always a big headache to use a unusual symbol like .
to represent something in an expression. If one does not read the documentation about the basic syntactic rules, such misuse can be hard to avoid.
pipeR's syntax for first-argument piping is quite simple: if a function (name or call) follows, left is always piped to the first argument on right. This rule is designed in particular to avoid symbol detection in the function arguments because:
1:10 %>>% sum(.)
means sum(1:10)
, one then expects that 1:10 %>>% sum(.^2)
also means sum((1:10)^2)
, so to what extent, the first argument piping should play? There's no place for first argument piping any more.> 1:3 %>>% sum(.)
[1] 12
If this should fail or raise warning, what about dplyr::do()
and rlist::list.sort()
, both use nonstandard evaluation and give .
their own interpretations?
dplyr::do()
interprets .
as a unit data frame (maybe grouped)
> library(dplyr)
> mtcars %>>% group_by(vs) %>>% do(data = ., model = lm(mpg ~ wt + cyl, data = .))
Source: local data frame [2 x 3]
Groups: <by row>
vs data model
1 0 <S3:tbl_df, tbl, data.frame> <S3:lm>
2 1 <S3:tbl_df, tbl, data.frame> <S3:lm>
rlist::list.sort()
and most other functions interpret .
as an element in the given list.
> library(rlist)
> mtcars$mpg %>>% list.sort(.)
[1] 10.4 10.4 13.3 14.3 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 19.2 19.2 19.7
[19] 21.0 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3 30.4 30.4 32.4 33.9
> mtcars$mpg %>>% list.sort(desc(.))
[1] 33.9 32.4 30.4 30.4 27.3 26.0 24.4 22.8 22.8 21.5 21.4 21.4 21.0 21.0 19.7 19.2 19.2 18.7
[19] 18.1 17.8 17.3 16.4 15.8 15.5 15.2 15.2 15.0 14.7 14.3 13.3 10.4 10.4
If such use as x %>>% f(.)
causes warnings, there can be too many.
For the second comment, it is indeed an issue here. The feature of first-argument piping comes with .
piping together to facilitate potential use in multiple place within a function call. One walk around is to use lambda expression to avoid symbol clash, another is a function that I'm considering to support, I(expr)
which directly evaluates expr
in the parent environment so that your code may be written like
. <- "greater"
1:3 %>>% t.test(alternative = I(.))
I've actually considered the GUID
(or rare name) solution but it works poorly with functions that record their calling expressions such as plot()
, lm()
etc. If such messy names are used, your plot title will by default a messy name like this :)
Therefore I still need to think hard on how we could avoid potential misuse of features by users who do not read the documentation while at the same time ensure that the code works correctly if they are used correctly.
Anyway, thanks for your kind remind!
Just a quick comment.
I meant that 1:3 %>>% sum(.)
should be 6
. But at the moment 1:3 %>>% sum(.)
returns 12
.
And this is because the function is evaluated as sum(1:3, 1:3)
instead of sum(1:3)
.
With dplyr
, 1:3 %>% sum(.)
returns 6
, I've not yet inspect the internal of dplyr though.
For magrittr's %>%
, it works in that way because if %>%
detects any naked .
in the function call, it will switch to dot piping rather than first-argument piping.
Please try more with it such as
> 1:3 %>% sum(.)
[1] 6
> 1:3 %>% sum(.+1)
[1] 15
> 1:3 %>% c(.)
[1] 1 2 3
> 1:3 %>% c(.+1)
[1] 1 2 3 2 3 4
> 1:3 %>% c(., mean(.))
[1] 1 2 3 2
> 1:3 %>% c(min(.), max(.))
[1] 1 2 3 1 3
> 1:3 %>% c((.))
[1] 1 2 3 1 2 3
If many of the arguments of a function need to be specified, and a .
is hidden in a big mess of arguments, it can be hard to find out which mechanism is exactly used.
Maybe this issue is helpful: https://github.com/renkun-ken/pipeR/issues/18
Thanks for the reply!!
In my environment, magrittr 1.0.1, 1:3 %>% sum(.+1)
throws an error.
Which version of magrittr?
> 1:3 %>% c(.)
[1] 1 2 3
> 1:3 %>% mean(.)
[1] 2
> 1:3 %>% c((.))
Error in eval(expr, envir, enclos) : object '.' not found
> 1:3 %>% c(mean(.))
Error in mean(.) : object '.' not found
I was commenting with the dev
version of magrittr which tries to support nested .
, CRAN version and version imported by dplyr
simply does not support nested .
, which may be a good choice. But it brings a problem:
If users see sum(.)
works, they may expect sum(.+1)
to work too, because they don't see it "nested" while in fact it is sum(+(.,1))
.
Currently, I can only beg users to read the docs first and know the basic syntax rules before misuses happen.
Thanks, I found the same behavior with yours. So now magrittr has same problem on dev branch... This is absolutely unexpected behavior for users, and even worse, user is difficult to be aware of this behavior...
> 1:3 %>% c(.)
[1] 1 2 3
> 1:3 %>% c(.+1)
[1] 1 2 3 2 3 4
> 1:3 %>>% c(.)
[1] 1 2 3 1 2 3
> 1:3 %>>% c(.+1)
[1] 1 2 3 2 3 4
Yes, there are some prices to pay if you want any convenience. It's better that the price or consequence is predictable and easy to comprehend.
So I have to stress the rules like:
If you directly put a function there, it is ALWAYS first-argument piping!
A tip is Don't use .
when it is unnecessary.*
magrittr's author has to stress the rule like:
If you put a function there, it may be first-argument piping or dot piping: if a naked dot is detected in the argument, it will be dot piping, or otherwise it would still be first-argument piping.
A tip is Be careful when you use arithmetic operations because they make nested call too....
But this is too expensive ;-p Getting wrong results without knowing is much worse than getting nothing...
If I'll implement, I'd choose the old magrittr way. But anyway I try to find better solution.
Think the current best solution is to remember the syntax rules... :)
Or I have to shut down the feature that .
is supported in first-argument piping. In this way, there won't be problem any more. :p
The ultimate tip is that Write in the simplest form allowed...
If you are allowed to write 1:3 %>>% sum
and 1:3 %>>% sum()
, don't write 1:3 %>>% sum(.)
, I see that .
is written on purpose, adds no value, leads to unexpected results due to misuse.
An interesting problem with the title of this issue is that it is hard for the function to tell which usage is incorrect... due to a lack of knowledge of your purpose so it only can assume that you know the rules.
Think the current best solution is to remember the syntax rules... :)
Exactly :-)
Note that at the moment, using dot with first-argument piping always suffers from danger.
The list.sort
example, although it looks working correctly, the list.sort
is actually called as list.sort(mtcars$mpg, .)
.
:) yes, and list.sort()
does not work with %>%
in this simple case because %>%
interprets it as the user wants to put the data here, so list.sort(mtcars$mpg)
which does not work while list.sort()
interprets .
as the user wants to sort the numbers directly rather than after some transformation.
Non-standard evaluation allows different functions to have different interpretations of symbols and expressions. But when two interpretations clash, everything goes wrong. That's why pipeR tries to interpret as little as possible to avoid such interpretation clash.
Is there any way or possible solution to check such problems before running the commands/procedures in R?
@yanlinlin82, I'm thinking about it too. The problem is it is a problem for some functions, but it is not a problem for others. For sum()
this usage is obviously a misuse, but for rlist::list.sort()
and dplyr::do()
it may not be a misuse at all because these functions have their own interpretation of .
inside their function calls.
If we have to detect such seemingly problematic usages, the only way I come up with is to build a blacklist or white-list of functions to indicate whether such usage is correct or not. The idea is clearly very stupid because there are countless functions and even more combinations and possible nested calls.
If you have a smarter solution to detect such problem, please let me know, at least this can be a challenge. Clearly this is a tradeoff between features and people may have mixed feelings about it.
I guess every function should be responsible for such checking (if it has to), since their interpretations are different. Considering the huge cost of performance, a pre-running parse should be ideal, especially supporting from the basic mechanism (I do not know if such thing exists or not).
If no, my so far best try would be using some kind of global variable to mark if the checking has been done (for current calling), which, however, may be too ugly to implement and also will reduce a little performance. For the latter, my experience in C/assembly would guide me to use directly a function pointer instead of the flag, which could make the code uglier :)
I reopen this issue for further discussion.
There seems no more discussion on this.
On branch 0.5, I found this behavior.
Obviously this is incorrect usage of
.
, but anyway people will do this mistake very often. So it'd be better to invoke error.This is because, in
pipe_first
,is finally parsed as
Note that this behavior occurs when the functions take variable length of arguments.