tidyverse / tibble

A modern re-imagining of the data frame
https://tibble.tidyverse.org/
Other
671 stars 130 forks source link

Weird row names inconsistency #1571

Open moodymudskipper opened 7 months ago

moodymudskipper commented 7 months ago

I create two identical() tibbles but they behave differently. This is freaking me out.

This is really hard to debug because rbind() doesn't call tibble explicitly, it's a non generic base function that calls Internal(rbind()) and in there rownames<- is called and since it's generic we get into the tibble package.

library(tibble)
tb1 <- tibble(
  e = structure(list(list(4), list(5)), class = c("tbl_df", "tbl", "data.frame"), row.names = 1L, names = c("f", "g"))
)
tb2 <- tibble(e = tibble::tibble(f = list(4), g = list(5)))
identical(tb1, tb2, attrib.as.set = FALSE)
#> [1] TRUE

attributes(tb1$e)
#> $class
#> [1] "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#> [1] 1
#> 
#> $names
#> [1] "f" "g"

attr(rbind(tb1)$e, "row.names")
#> Warning: Setting row names on a tibble is deprecated.
#> [1] "1"
attr(rbind(tb2)$e, "row.names")
#> [1] 1

options(warn=2)
rbind(tb1)
#> Error: (converted from warning) Setting row names on a tibble is deprecated.
traceback()
#> 10: doWithOneRestart(return(expr), restart)
#> 9: withOneRestart(expr, restarts[[1L]])
#> 8: withRestarts({
#>        .Internal(.signalCondition(cond, message, call))
#>        .Internal(.dfltWarn(message, call))
#>    }, muffleWarning = function() NULL)
#> 7: warning(cnd)
#> 6: warn("Setting row names on a tibble is deprecated.")
#> 5: `row.names<-.tbl_df`(`*tmp*`, value = value)
#> 4: `row.names<-`(`*tmp*`, value = value)
#> 3: `rownames<-`(`*tmp*`, value = `*vtmp*`)
#> 2: rbind(deparse.level, ...)
#> 1: rbind(tb1)

Do we set some global handlers that for some reason get triggered in one case and not the other ?

If we debug rownames<- we can hop into Internal(rbind()) and we see that :

debug(`rownames<-`)
rbind(tb1)
rbind(tb2)
krlmlr commented 7 months ago

Thanks. The following works, though, the difference is in the row.names attribute:

library(tibble)
tb1 <- tibble(
  e = structure(
    list(list(4), list(5)),
    class = c("tbl_df", "tbl", "data.frame"),
    row.names = c(NA_integer_, -1L),
    names = c("f", "g")
  )
)
tb2 <- tibble(e = tibble::tibble(f = list(4), g = list(5)))
identical(tb1, tb2, attrib.as.set = FALSE)
#> [1] TRUE

attributes(tb1$e)
#> $class
#> [1] "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#> [1] 1
#> 
#> $names
#> [1] "f" "g"

attr(rbind(tb1)$e, "row.names")
#> [1] 1
attr(rbind(tb2)$e, "row.names")
#> [1] 1

Created on 2024-03-20 with reprex v2.1.0