r-lib / lintr

Static Code Analysis for R
https://lintr.r-lib.org
Other
1.16k stars 184 forks source link

New linter to recommend `list2DF()` base R function #2596

Open Bisaloo opened 4 weeks ago

Bisaloo commented 4 weeks ago

As an alternative to do.call(cbind.data.frame, x).

x <- list(
  ind = seq_along(letters),
  let = letters
)

do.call(cbind.data.frame, x)
#>    ind let
#> 1    1   a
#> 2    2   b
#> 3    3   c
#> 4    4   d
#> 5    5   e
#> 6    6   f
#> 7    7   g
#> 8    8   h
#> 9    9   i
#> 10  10   j
#> 11  11   k
#> 12  12   l
#> 13  13   m
#> 14  14   n
#> 15  15   o
#> 16  16   p
#> 17  17   q
#> 18  18   r
#> 19  19   s
#> 20  20   t
#> 21  21   u
#> 22  22   v
#> 23  23   w
#> 24  24   x
#> 25  25   y
#> 26  26   z

list2DF(x)
#>    ind let
#> 1    1   a
#> 2    2   b
#> 3    3   c
#> 4    4   d
#> 5    5   e
#> 6    6   f
#> 7    7   g
#> 8    8   h
#> 9    9   i
#> 10  10   j
#> 11  11   k
#> 12  12   l
#> 13  13   m
#> 14  14   n
#> 15  15   o
#> 16  16   p
#> 17  17   q
#> 18  18   r
#> 19  19   s
#> 20  20   t
#> 21  21   u
#> 22  22   v
#> 23  23   w
#> 24  24   x
#> 25  25   y
#> 26  26   z

y <- list(
  var1 = runif(1e6),
  var2 = rnorm(1e6),
  var3 = rnorm(1e6),
  var4 = rnorm(1e6)
)

microbenchmark::microbenchmark(
  do.call(cbind.data.frame, y),
  list2DF(y),
  check = "identical"
)
#> Unit: microseconds
#>                          expr     min       lq      mean   median       uq
#>  do.call(cbind.data.frame, y) 125.398 131.2090 158.49506 139.6595 166.9540
#>                    list2DF(y)   6.227   7.1795  10.40463   8.3165  10.6615
#>      max neval
#>  344.463   100
#>   67.065   100

Created on 2024-06-07 with reprex v2.1.0

Some hits in the wild: https://github.com/search?q=org%3Acran+%2Fdo%5C.call%5C%28cbind%5C.data%5C.frame%2F&type=code

Are there some edge cases one needs to be aware of? Happy to submit a PR for this.

TimTaylor commented 4 weeks ago

Are there some edge cases one needs to be aware of?

list2DF() doesn't recycle vectors whereas cbind.data.frame() will. I think list2DF() did recycle when first introduced but then became strict in a subsequent release.

x <- list(ind = 1:2, let = letters[1:6])
try(list2DF(x))
#> Error in list2DF(x) : all variables should have the same length
do.call(cbind.data.frame, x)
#>   ind let
#> 1   1   a
#> 2   2   b
#> 3   1   c
#> 4   2   d
#> 5   1   e
#> 6   2   f

even length 1 not recycled

x <- list(ind = 1, let = letters[1:6])
try(list2DF(x))
#> Error in list2DF(x) : all variables should have the same length
do.call(cbind.data.frame, x)
#>   ind let
#> 1   1   a
#> 2   1   b
#> 3   1   c
#> 4   1   d
#> 5   1   e
#> 6   1   f
MichaelChirico commented 3 weeks ago

In that case it makes it tough to recommend list2DF() statically. OTOH, even just data.frame() (which does do recycling) seems preferable?

I do see a surprising number of call sites for do.call(cbind.data.frame:

https://github.com/search?q=lang%3AR+%2Fdo%5B.%5Dcall%5B%28%5D%5Cs*%5B%27%22%5D%3Fcbind%5B.%5Ddata%5B.%5Dframe%2F&type=code

So we could recommend just using data.frame() instead, and mentioning list2DF() as another alternative to consider if recycling is not needed?

AshesITR commented 2 weeks ago

Is there different behavior of data.frame() vs. cbind.data.frame() if the argument lists contain complex types such as data frames / matrices / lists?

MichaelChirico commented 2 weeks ago

Here's cbind.data.frame()

function (..., deparse.level = 1) 
data.frame(..., check.names = FALSE)
<bytecode: 0x55bca92f06a0>
<environment: namespace:base>

:upside_down_face: