Closed caijun closed 5 years ago
I'm not sure this is necessarily a bug.... If you try this with other functions that expect vectors, then a similar error is thrown:
dat %>% sum(.$generation)
#> Error in FUN(X[[i]], ...) :
#> only defined on a data frame with all numeric variables
In dplyr, to obtain the sum of a variable is by using summarise()
. But I can't figure out a way to make incidence()
to support .
in a pipe operator.
> dat %>%
+ summarise(sum.gen = sum(.$generation))
sum.gen
1 97515
In dplyr, to obtain the sum of a variable is by using summarise().
That was not the point.
But I can't figure out a way to make incidence() to support . in a pipe operator.
The point is that this is not an incidence()
-specific problem. It's a problem with passing data through pipes.
You could also use dat %>% pull(date_of_onset) %>% incidence()
, but obviously, this doesn't extend to the use of groups.
Yes, the following code also pass the variable (a vector) into incidence. However, I would like incidence()
to support .
, then the pipeline could start with a data.frame and other variables contained in .
could also be used, such as the groups.
dat$date_of_onset %>%
incidence()
Can you show an example of a non-tidyverse function that supports the .
and outputs something other than a data frame?
Another way to deal with it is to construct incidence objects as rows in a data frame column:
dat %>%
summarise(i = list(incidence(date_of_onset, groups = gender))) %>%
as_tibble()
The problem is that the pipe operator keeps trying to insert the entire data frame as the first argument to the incidence()
function. What we could do is to create an incidence.data.frame
method that looks like this:
incidence.data.frame <- function(dat, x, ...) incidence(x[[1]], ...)
This seems to allow your example to work and I think it would allow the bare names to work, but I can't be certain
inc <- dat %>%
incidence(.$date_of_onset)
Yes, examples are from the magrittr help files
library(magrittr)
help("%>%")
iris %>% subset(., 1:nrow(.) %% 2 == 0)
Yes, examples are from the magrittr help files
I see the problem. These functions (subset and nrow) expect data frames, incidence expects a vector. Moreover, using .$column
is different than using .
:
iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
#> Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
#> 'subset' must be logical
.
represents the data.frame in the lhs, and .$
means to extract the variable in the data.frame represented by .
. represents the data.frame in the lhs, and .$ means to extract the variable in the data.frame represented by .
Again, not the point. I was trying to show that if you used the same construct you showed in your original example, you end up with an error because of the way magrittr works with these things.
For example, above, I modified the example to subset only the "Species" vector:
iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
#> Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
#> 'subset' must be logical
This is conceptually equivalent to this:
subset(iris$Species, 1:nrow(iris)%%2 == 0)
#> [1] setosa setosa setosa setosa setosa setosa
#> [7] setosa setosa setosa setosa setosa setosa
#> [13] setosa setosa setosa setosa setosa setosa
#> [19] setosa setosa setosa setosa setosa setosa
#> [25] setosa versicolor versicolor versicolor versicolor versicolor
#> [31] versicolor versicolor versicolor versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor virginica virginica virginica virginica
#> [55] virginica virginica virginica virginica virginica virginica
#> [61] virginica virginica virginica virginica virginica virginica
#> [67] virginica virginica virginica virginica virginica virginica
#> [73] virginica virginica virginica
#> Levels: setosa versicolor virginica
Created on 2018-12-08 by the reprex package (v0.2.1)
The same exact thing happens in the original example you gave:
dat %>% incidence(.$date_of_onset) # Error
incidence(dat$date_of_onset) # Success
So, the problem isn't necessarily that incidence objects can't be constructed via pipe (you can do it if you use summarise and store the results in a list), but rather that the command isn't correctly formed.
Again, the direct construction of an incidence object from multiple columns with the .$column
construct would work if there were an incidence.data.frame()
method, but I'm wary of including something like that because I don't want to include rlang or dplyr as a dependency.
iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
In this example, as you used .
, subset.data.frame()
is invoked, in which the first argument should be a data.frame, but you input .$Species
that is a vector. Then it produced the error.
subset(iris$Species, 1:nrow(iris)%%2 == 0)
In the second example, the subset.default()
is invoked and it works well with a vector input.
if there were an
incidence.data.frame()
method
Correct. But using incidence()
with .
can make the code compact and flexible.
If you want to make a PR to implement this, I'll consider it.
In this example, as you used ., subset.data.frame() is invoked, in which the first argument should be a data.frame, but you input .$Species that is a vector. Then it produced the error.
If you look at the error closely, this is not exactly the case. The error message shows:
Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
'subset' must be logical
You can see the initial .
even though I did not specify it. Besides subset.default()
would be invoked if .$Species
were passed first.
Also see the help file of %>%
Placing lhs as the first argument in rhs call The default behavior of %>% when multiple arguments are required in the rhs call, is to place lhs as the first argument, i.e. x %>% f(y) is equivalent to f(x, y).
Therefore, iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
is equivalent to to subset(., .$Species, 1:nrow(.) %% 2 == 0)
or subset(iris, .$Species, 1:nrow(.) %% 2 == 0)
. Under this case, $Species
is specified to the second argument subset
which expects a logical variable. Actually, you can make this example work by specifying arguments using the complete names instead of the default by position, as follows.
> iris %>% subset(x = .$Species, subset = 1:nrow(.) %% 2 == 0)
[1] setosa setosa setosa setosa setosa setosa setosa setosa setosa
[10] setosa setosa setosa setosa setosa setosa setosa setosa setosa
[19] setosa setosa setosa setosa setosa setosa setosa versicolor versicolor
[28] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[37] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[46] versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica
[55] virginica virginica virginica virginica virginica virginica virginica virginica virginica
[64] virginica virginica virginica virginica virginica virginica virginica virginica virginica
[73] virginica virginica virginica
Levels: setosa versicolor virginica
Right, again, we are off point. Make the PR and I'll consider it.
Hi @caijun,
This may be too little too late, especially with #104 under consideration, but I just discovered that you can create incidence objects through piping by using the with()
function from base R:
dat <- outbreaks::ebola_sim$linelist
library(tidyverse)
library(incidence)
dat %>%
with(incidence(date_of_onset, group = gender))
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-30]
#> [2 groups: f, m]
#>
#> $counts: matrix with 389 rows and 2 columns
#> $n: 5888 cases in total
#> $dates: 389 dates marking the left-side of bins
#> $interval: 1 day
#> $timespan: 389 days
#> $cumulative: FALSE
Created on 2019-05-07 by the reprex package (v0.2.1)
@zkamvar Thanks for telling me the solution. By using with()
function, incidence objects can be created in the piping chain. However, this way cannot reflect the spirit of chaining pipes. Every chain should call a function and avoid nesting function calls as much as possible.
Hopefully with #104, we shouldn't need to worry too much about this pattern going further.
Created on 2018-12-08 by the reprex package (v0.2.1)