tidyverse / dplyr

dplyr: A grammar of data manipulation
https://dplyr.tidyverse.org/
Other
4.74k stars 2.12k forks source link

filter may crash RStudio Session when combining rbind and rowwise #7057

Open Gastonia02 opened 1 month ago

Gastonia02 commented 1 month ago

I get an RStudio "Session aborted" when using the filter function on a dataframe that was produced by using rbind on dataframe that had the rowwise grouping. Ungrouping before filtering stop the issue, but the crash make it weird and hard to fix.

This code should reproduce the crash

# install.packages(c("tidyr", "dplyr"))
library(tidyr)
library(dplyr)

test <- data.frame("col1" = c(1,2,3,4,5,85,74,5,32,6,8), "col" = c("A","B","B","B","A","C","A","C","B","A","A"))
test <- rowwise(test)
test <- rbind(test, test)
filter(test, col == "A")  #crashes

# R version 4.4.1 (2024-06-14)
# had same issue on 4.3
# dplyr 1.1.4

Thanks for your time !

philibe commented 1 month ago

test <- rowwise(test) and rbind(test, test) seem to me like create a sort of recursive data. oO ?!

dput(test)
structure(list(col1 = c(1, 2, 3, 4, 5, 85, 74, 5, 32, 6, 8, 1, 
2, 3, 4, 5, 85, 74, 5, 32, 6, 8), col = c("A", "B", "B", "B", 
"A", "C", "A", "C", "B", "A", "A", "A", "B", "B", "B", "A", "C", 
"A", "C", "B", "A", "A")), row.names = c(NA, -22L), groups = structure(list(
    .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
        10L, 11L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame")), class = c("rowwise_df", "tbl_df", "tbl", 
"data.frame"))

Try that:

test1 <- data.frame("col1" = c(1,2,3,4,5,85,74,5,32,6,8), "col" = c("A","B","B","B","A","C","A","C","B","A","A"))
test2 <- rowwise(test1)
test0 <- rbind(test1, test2)

test0 has a normal structure:

dput(test0)
structure(list(col1 = c(1, 2, 3, 4, 5, 85, 74, 5, 32, 6, 8, 1, 
2, 3, 4, 5, 85, 74, 5, 32, 6, 8), col = c("A", "B", "B", "B", 
"A", "C", "A", "C", "B", "A", "A", "A", "B", "B", "B", "A", "C", 
"A", "C", "B", "A", "A")), row.names = c(NA, -22L), class = "data.frame")

And filter works:

dplyr::filter(test0, col == "A") 
   col1 col
1     1   A
2     5   A
3    74   A
4     6   A
5     8   A
6     1   A
7     5   A
8    74   A
9     6   A
10    8   A

PS: I'm a simple user (ie not from the tidyverse team).

Gastonia02 commented 1 month ago

My issue is that sometimes I use rbind on objects that both are grouped row-wise. This seems to create a recursive thing that make RStudio crash

This structure seems to be normal for dataframes grouped row-wise :

test1 <- data.frame("col1" = c(1,2,3,4,5,85,74,5,32,6,8), "col" = c("A","B","B","B","A","C","A","C","B","A","A"))
test01 <- rowwise(test1)

dput(test01)
structure(list(col1 = c(1, 2, 3, 4, 5, 85, 74, 5, 32, 6, 8), 
    col = c("A", "B", "B", "B", "A", "C", "A", "C", "B", "A", 
    "A")), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -11L), groups = structure(list(.rows = structure(list(
    1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame")))

But without rbind it does not crash :

dplyr::filter(test01, col == "A") 
# A tibble: 5 × 2
# Rowwise: 
   col1 col  
  <dbl> <chr>
1     1 A    
2     5 A    
3    74 A    
4     6 A    
5     8 A   
etiennebacher commented 1 month ago

This might be a duplicate of https://github.com/tidyverse/dplyr/issues/7024, the issue was transferred to https://github.com/r-lib/vctrs/issues/1935