tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.75k stars 2.12k forks source link

Unsupported use of matrix error using dplyr #2498

Closed mislav0207 closed 7 years ago

mislav0207 commented 7 years ago

Lets I have data frame like this:

    df <- structure(list(subjecttaxnoid = c("22661187010", "10346575807", 
"22439110996", "63510438612", "85267957976", "40178118040", "51246665873", 
"66803849969", "45813719599", "26979059418", "11240408751"), 
    reportyear = c(2014L, 2014L, 2014L, 2008L, 2008L, 2008L, 
    2008L, 2013L, 2013L, 2013L, 2013L), b001 = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0), b002 = c(0, 3.43884233571018e-07, 7.24705810574303e-08, 
    1.41222784374111e-07, 1.62917712565032e-05, 0, 4.53310814208705e-07, 
    7.63856039195011e-06, 0, 0, 0)), .Names = c("subjecttaxnoid", 
"reportyear", "b001", "b002"), row.names = c(1L, 2L, 3L, 200000L, 
200001L, 200002L, 200003L, 40000L, 40001L, 40002L, 40003L), class = "data.frame")

and the vector that containt names of two columns of df:

x <- c("b001", "b002")

I would like to use components of x instead of columns names in dplyr:

my_list <- list()
for (i in 1:length(x)){
  my_list[[1]] <- df %>% group_by(reportyear) %>% top_n(2, wt = x[1])
}

This returns an error:


 Error in eval(substitute(expr), envir, enclos) : 
  Unsupported use of matrix or array for column indexing

Could you please help with this issue?

hadley commented 7 years ago

Please create a reprex using the reprex package, as described in the issue template.

mislav0207 commented 7 years ago
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

  df <- structure(list(subjecttaxnoid = c("22661187010", "10346575807", 
                                          "22439110996", "63510438612", "85267957976", "40178118040", "51246665873", 
                                          "66803849969", "45813719599", "26979059418", "11240408751"), 
                       reportyear = c(2014L, 2014L, 2014L, 2008L, 2008L, 2008L, 
                                      2008L, 2013L, 2013L, 2013L, 2013L), b001 = c(0, 0, 0, 0, 
                                                                                   0, 0, 0, 0, 0, 0, 0), b002 = c(0, 3.43884233571018e-07, 7.24705810574303e-08, 
                                                                                                                  1.41222784374111e-07, 1.62917712565032e-05, 0, 4.53310814208705e-07, 
                                                                                                                  7.63856039195011e-06, 0, 0, 0)), .Names = c("subjecttaxnoid", 
                                                                                                                                                              "reportyear", "b001", "b002"), row.names = c(1L, 2L, 3L, 200000L, 
                                                                                                                                                                                                           200001L, 200002L, 200003L, 40000L, 40001L, 40002L, 40003L), class = "data.frame")

    x <- c("b001", "b002")

    my_list <- list()
  for (i in 1:length(x)){
    my_list[[1]] <- df %>% group_by(reportyear) %>% top_n(2, wt = x[1])
  }
#> Error in eval(substitute(expr), envir, enclos): Unsupported use of matrix or array for column indexing
mislav0207 commented 7 years ago

I suppose, this what you mean by "creating reprex". Sorry, I have never done this before :)

hadley commented 7 years ago

That's a great first step. The next step is to make the reprex as small as possible so I can understand it more easily. For example, you could make the data frame simpler, and create it with data.frame()

mislav0207 commented 7 years ago
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
df <- data.frame(reportyear = c(2014L, 2014L, 2014L, 2013L, 2013L, 2013L),
                 b001 = c(10:15),
                 b002 = c(1:6))

x <- c("b001", "b002")

my_list <- list()
for (i in 1:length(x)){
  my_list[[i]] <- df %>% group_by(reportyear) %>% top_n(2, wt = x[1])
}
#> Error in eval(substitute(expr), envir, enclos): Unsupported use of matrix or array for column indexing

Better?

zeehio commented 7 years ago

If I understood this well, @mislav0207 wants to give to top_n a column name as a character string, but top_n currently expects a bare column name. I can workaround the issue by creating a temporary column, but there must be more elegant solutions. Here is the workaround for the for loop.

my_list <- lapply(x, function(col) {
  df$tempcol <- df[[col]]
  df %>% group_by(reportyear) %>%  top_n(2, wt = tempcol) %>% select(-tempcol)
})

With the output:

> my_list
[[1]]
Source: local data frame [4 x 3]
Groups: reportyear [2]

  reportyear  b001  b002
       <int> <int> <int>
1       2014    11     2
2       2014    12     3
3       2013    14     5
4       2013    15     6

[[2]]
Source: local data frame [4 x 3]
Groups: reportyear [2]

  reportyear  b001  b002
       <int> <int> <int>
1       2014    11     2
2       2014    12     3
3       2013    14     5
4       2013    15     6
krlmlr commented 7 years ago

@lionel-: Is there a nicer way to do this in the new tidyeval framework?

lionel- commented 7 years ago

yes I've ported all these functions to tidyeval in a branch yet to be pushed.

You'll do it like this:

my_list <- list()
for (i in 1:length(x)){
  my_list[[1]] <- df %>% group_by(reportyear) %>% top_n(2, wt = !! sym(x[1]))
}