Closed thisisnic closed 10 months ago
Thanks so much for the detail here! 🙌
I think that what we want to do here is strip off the tibble classes when we read, for consistency with how everything else works in pins. Notice, for example, the tests that will fail with the new arrow release:
https://github.com/rstudio/pins-r/blob/main/tests/testthat/test-pin-read-write.R
It seems weird in the pins context to get back out a tibble when you stored a data.frame
. I'm open to discussion here, though.
It seems weird in the pins context to get back out a tibble when you stored a data.frame. I'm open to discussion here, though.
I think from the other side too--if you write a tibble to csv, you'll currently get back a data.frame. I can't really say what R folks want to happen, but if a tibble feels like a "subclass" of a data.frame (e.g. liskov substitutable; w/e that means here, since I know the way they get printed is also important), I wonder if that might make it feel okay to return?
I am realizing that if we strip off the tibble subclass when reading, then if someone stores a tibble as parquet, it will come back as a data.frame
, which would definitely feel real bad. 😬
(I don't think it's reasonable to keep track of the classes here in pins metadata, like "this was originally a tibble", given how arrow now behaves.)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
We're planning on releasing arrow 13.0.0 to CRAN in the next couple of weeks, and our revdepchecks flagged up some test failures with pins.
In short, a change in https://github.com/apache/arrow/pull/35173 constitutes a breaking change - we now return
tibble
object instead ofdata.frame
objects from ourread_*
functions. I was going to submit a PR, but I wasn't sure whether it'd be tests or code that would make sense to update.