reconhub / incidence

☣:chart_with_upwards_trend::chart_with_downwards_trend:☣ Compute and visualise incidence
https://reconhub.github.io/incidence
Other
58 stars 13 forks source link

Error in is.finite(x) : default method not implemented for type 'list' #86

Closed caijun closed 5 years ago

caijun commented 5 years ago
library(outbreaks)

dat <- ebola_sim$linelist
head(dat)
#>   case_id generation date_of_infection date_of_onset
#> 1  d1fafd          0              <NA>    2014-04-07
#> 2  53371b          1        2014-04-09    2014-04-15
#> 3  f5c3d8          1        2014-04-18    2014-04-21
#> 4  6c286a          2              <NA>    2014-04-27
#> 5  0f58c4          2        2014-04-22    2014-04-26
#> 6  49731d          0        2014-03-19    2014-04-25
#>   date_of_hospitalisation date_of_outcome outcome gender
#> 1              2014-04-17      2014-04-19    <NA>      f
#> 2              2014-04-20            <NA>    <NA>      m
#> 3              2014-04-25      2014-04-30 Recover      f
#> 4              2014-04-27      2014-05-07   Death      f
#> 5              2014-04-29      2014-05-17 Recover      f
#> 6              2014-05-02      2014-05-07    <NA>      f
#>             hospital
#> 1  Military Hospital
#> 2 Connaught Hospital
#> 3              other
#> 4               <NA>
#> 5              other
#> 6               <NA>

library(tidyverse)
library(incidence)
inc <- dat %>% 
  incidence(.$date_of_onset)
#> Error in is.finite(x): default method not implemented for type 'list'

Created on 2018-12-08 by the reprex package (v0.2.1)

zkamvar commented 5 years ago

I'm not sure this is necessarily a bug.... If you try this with other functions that expect vectors, then a similar error is thrown:

dat %>% sum(.$generation)
#> Error in FUN(X[[i]], ...) :
#>  only defined on a data frame with all numeric variables
caijun commented 5 years ago

In dplyr, to obtain the sum of a variable is by using summarise(). But I can't figure out a way to make incidence() to support . in a pipe operator.

> dat %>% 
+   summarise(sum.gen = sum(.$generation))
  sum.gen
1   97515
zkamvar commented 5 years ago

In dplyr, to obtain the sum of a variable is by using summarise().

That was not the point.

But I can't figure out a way to make incidence() to support . in a pipe operator.

The point is that this is not an incidence()-specific problem. It's a problem with passing data through pipes.

zkamvar commented 5 years ago

You could also use dat %>% pull(date_of_onset) %>% incidence(), but obviously, this doesn't extend to the use of groups.

caijun commented 5 years ago

Yes, the following code also pass the variable (a vector) into incidence. However, I would like incidence() to support ., then the pipeline could start with a data.frame and other variables contained in . could also be used, such as the groups.

dat$date_of_onset %>% 
  incidence()
zkamvar commented 5 years ago

Can you show an example of a non-tidyverse function that supports the . and outputs something other than a data frame?

zkamvar commented 5 years ago

Another way to deal with it is to construct incidence objects as rows in a data frame column:

dat %>% 
  summarise(i = list(incidence(date_of_onset, groups = gender))) %>% 
  as_tibble()
zkamvar commented 5 years ago

The problem is that the pipe operator keeps trying to insert the entire data frame as the first argument to the incidence() function. What we could do is to create an incidence.data.frame method that looks like this:

incidence.data.frame <- function(dat, x, ...) incidence(x[[1]], ...)

This seems to allow your example to work and I think it would allow the bare names to work, but I can't be certain

inc <- dat %>% 
  incidence(.$date_of_onset)
caijun commented 5 years ago

Yes, examples are from the magrittr help files

library(magrittr)
help("%>%")

iris %>% subset(., 1:nrow(.) %% 2 == 0)
zkamvar commented 5 years ago

Yes, examples are from the magrittr help files

I see the problem. These functions (subset and nrow) expect data frames, incidence expects a vector. Moreover, using .$column is different than using .:

iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
#> Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
#>  'subset' must be logical
caijun commented 5 years ago

. represents the data.frame in the lhs, and .$ means to extract the variable in the data.frame represented by .

zkamvar commented 5 years ago

. represents the data.frame in the lhs, and .$ means to extract the variable in the data.frame represented by .

Again, not the point. I was trying to show that if you used the same construct you showed in your original example, you end up with an error because of the way magrittr works with these things.


For example, above, I modified the example to subset only the "Species" vector:

iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)
#> Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
#>  'subset' must be logical

This is conceptually equivalent to this:

subset(iris$Species, 1:nrow(iris)%%2 == 0)
#>  [1] setosa     setosa     setosa     setosa     setosa     setosa    
#>  [7] setosa     setosa     setosa     setosa     setosa     setosa    
#> [13] setosa     setosa     setosa     setosa     setosa     setosa    
#> [19] setosa     setosa     setosa     setosa     setosa     setosa    
#> [25] setosa     versicolor versicolor versicolor versicolor versicolor
#> [31] versicolor versicolor versicolor versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor virginica  virginica  virginica  virginica 
#> [55] virginica  virginica  virginica  virginica  virginica  virginica 
#> [61] virginica  virginica  virginica  virginica  virginica  virginica 
#> [67] virginica  virginica  virginica  virginica  virginica  virginica 
#> [73] virginica  virginica  virginica 
#> Levels: setosa versicolor virginica

Created on 2018-12-08 by the reprex package (v0.2.1)


The same exact thing happens in the original example you gave:

dat %>% incidence(.$date_of_onset) # Error
incidence(dat$date_of_onset) # Success

So, the problem isn't necessarily that incidence objects can't be constructed via pipe (you can do it if you use summarise and store the results in a list), but rather that the command isn't correctly formed.

zkamvar commented 5 years ago

Again, the direct construction of an incidence object from multiple columns with the .$column construct would work if there were an incidence.data.frame() method, but I'm wary of including something like that because I don't want to include rlang or dplyr as a dependency.

caijun commented 5 years ago

iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0)

In this example, as you used ., subset.data.frame() is invoked, in which the first argument should be a data.frame, but you input .$Species that is a vector. Then it produced the error.

subset(iris$Species, 1:nrow(iris)%%2 == 0)

In the second example, the subset.default() is invoked and it works well with a vector input.

caijun commented 5 years ago

if there were an incidence.data.frame() method

Correct. But using incidence() with . can make the code compact and flexible.

zkamvar commented 5 years ago

If you want to make a PR to implement this, I'll consider it.

zkamvar commented 5 years ago

In this example, as you used ., subset.data.frame() is invoked, in which the first argument should be a data.frame, but you input .$Species that is a vector. Then it produced the error.

If you look at the error closely, this is not exactly the case. The error message shows:

Error in subset.data.frame(., .$Species, 1:nrow(.)%%2 == 0) :
  'subset' must be logical

You can see the initial . even though I did not specify it. Besides subset.default() would be invoked if .$Species were passed first.

caijun commented 5 years ago

Also see the help file of %>%

Placing lhs as the first argument in rhs call The default behavior of %>% when multiple arguments are required in the rhs call, is to place lhs as the first argument, i.e. x %>% f(y) is equivalent to f(x, y).

Therefore, iris %>% subset(.$Species, 1:nrow(.) %% 2 == 0) is equivalent to to subset(., .$Species, 1:nrow(.) %% 2 == 0) or subset(iris, .$Species, 1:nrow(.) %% 2 == 0). Under this case, $Species is specified to the second argument subset which expects a logical variable. Actually, you can make this example work by specifying arguments using the complete names instead of the default by position, as follows.

> iris %>% subset(x = .$Species, subset = 1:nrow(.) %% 2 == 0)
 [1] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
[10] setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa     setosa    
[19] setosa     setosa     setosa     setosa     setosa     setosa     setosa     versicolor versicolor
[28] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[37] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
[46] versicolor versicolor versicolor versicolor versicolor virginica  virginica  virginica  virginica 
[55] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
[64] virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica  virginica 
[73] virginica  virginica  virginica 
Levels: setosa versicolor virginica
zkamvar commented 5 years ago

Right, again, we are off point. Make the PR and I'll consider it.

zkamvar commented 5 years ago

Hi @caijun,

This may be too little too late, especially with #104 under consideration, but I just discovered that you can create incidence objects through piping by using the with() function from base R:

dat <- outbreaks::ebola_sim$linelist
library(tidyverse)
library(incidence)
dat %>% 
  with(incidence(date_of_onset, group = gender))
#> <incidence object>
#> [5888 cases from days 2014-04-07 to 2015-04-30]
#> [2 groups: f, m]
#> 
#> $counts: matrix with 389 rows and 2 columns
#> $n: 5888 cases in total
#> $dates: 389 dates marking the left-side of bins
#> $interval: 1 day
#> $timespan: 389 days
#> $cumulative: FALSE

Created on 2019-05-07 by the reprex package (v0.2.1)

caijun commented 5 years ago

@zkamvar Thanks for telling me the solution. By using with() function, incidence objects can be created in the piping chain. However, this way cannot reflect the spirit of chaining pipes. Every chain should call a function and avoid nesting function calls as much as possible.

zkamvar commented 5 years ago

Hopefully with #104, we shouldn't need to worry too much about this pattern going further.