r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
282 stars 65 forks source link

There is a limit on the number of rows in a table #1939

Open Hernan-Z opened 3 weeks ago

Hernan-Z commented 3 weeks ago

Started when i tried an arithmetic (^2 but using *) in a mutate on a column created using a summarise and n() and one of the results was NA. Because of the warning, i learned that n() returns a integer class number (i don't see this class often) and later, that some operations with integers actually give results of class double. For curiosity, i wondered if n() can count tables with 2^31 or more rows, but when i tried to create a table with 2^31 rows, i got this error message:

Error en if (mirn && nrows[i] > 0L) {: 
  valor ausente donde TRUE/FALSE es necesario
Además: Aviso:
In attributes(.Data) <- c(attributes(.Data), attrib) :
  NAs introduced by coercion to integer range

(second row: missing value where TRUE/FALSE is necessary) I didn't understand why happened, until 2^31 - 1 rows worked. So i tried adding 1 row with bin_rows and a small table, got this error saying to report it, and here i am, i created a github account to share this.

long_table = data.frame(a = 1:(2^31-1))
short_table = data.frame(a = 1)
library(dplyr)
#> 
#> Adjuntando el paquete: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
new_table = bind_rows(long_table,short_table)
#> Error in `vec_rbind()`:
#> ! Negative `n` in `compact_rep()`.
#> ℹ In file 'utils.c' at line 897.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.

Created on 2024-06-13 with reprex v2.1.0

(By the way, i solved the NA value by adding a *1 before the first number, but i guess it's safer convert the class of the column created with n() to double right after the column creation.) (or using as.double(n()))