tidyverse / tibble

A modern re-imagining of the data frame
https://tibble.tidyverse.org/
Other
672 stars 130 forks source link

Accessing columns via .data is broken #686

Closed elinw closed 4 years ago

elinw commented 4 years ago

I am posting my analysis of the breakage in skimr since I think there are a number of packages which have similar issues.

You have the overall report here. https://github.com/tidyverse/tibble/blob/master/revdep/problems.md#skimr

We have an issue report here https://github.com/ropensci/skimr/issues/552

I have proposed a pull request for skimr here https://github.com/ropensci/skimr/pull/560/files that fixes the one issue in the code itself and some failing tests.

It would be helpful to know if you are firmly committed to all of the relevant changes. I'm sharing this so other people trying to fix breakage can find my analysis and solutions.

Here are the issues.

The code issue:

    ready_to_skim <- tibble::tibble(
      skim_type = unique(types),
      skimmers = purrr::map(combined_skimmers, mangle_names, names(base$funs)),
      skim_variable = split(selected, types)[.data$skim_type]

    )

In this case split(selected, types)[.data$skim_type] now returns NULL, triggering the lengths must be equal error. In this case I am replacing with skim_variable = split(selected, types)[unique(types)] (although other solutions are possible (just using skim_type) they add the complexity by using NSE).

In terms of failing tests:

  1. the biggest group of failures were all in cases like this

    expect_identical(input$skim_variable, "mpg")

    Which failed because input$skim now has names (input is a tibble). Applying unname() solved this.

  2. "Partitioning works in a round trip" Here is the log

input not equal to skimmed. Names: 15 string mismatches Component 1: 5 string mismatches Component 2: Modes: numeric, character Component 2: target is numeric, current is character Component 3: Mean relative difference: 1 Component 4: Modes: logical, numeric Component 4: target is logical, current is numeric Component 5: Modes: numeric, logical Component 5: target is numeric, current is logical ...

I was able to fix by changing expect_equal() to expect_identical() but I think it is worth you knowing that this was necessary.

  1. "knit_print appropriately falls back to tibble printing" This is the one I haven't been able to solve yet.

The printing of the new version of tibble no long has a * above the row number column (in the metadata). Since that test compares to an existing file I haven't come up with something that supports both the released and development versions of tibble.

krlmlr commented 4 years ago

Thank you for your analysis.

The upcoming release is a major release, a few breakages are unavoidable. We want to make the behavior of subsetting and subassignment as consistent as possible. The invariants article explains the reasoning and shows examples.

However, the problem here seems to be:

tibble::tibble(a = 1, b = .data$a)
#> Error in eval_tidy(xs[[j]], mask): object '.data' not found

library(rlang)
tibble::tibble(a = 1, b = .data$a)
#> Error: Expected a vector, not NULL

Created on 2020-01-01 by the reprex package (v0.3.0)

The CRAN version behaves as expected. This is a regression in tibble, I'll fix and rerun.

krlmlr commented 4 years ago

Thanks again. I'm not sure if this fixes all problems in skimr. Please let me know if anything else needs to change in tibble.

github-actions[bot] commented 3 years ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.