Open DavisVaughan opened 4 years ago
Probably best to use fields
instead of vectorised
so it's consistent with new_rcrd()
.
Summary of the main ideas:
Explicit inheritance from one of the base types is indicative of the storage type. In particular, the storage cannot be proxied. Inheriting from "list"
when the object typeof()
isn't "list"
is an internal error.
Explicit inheritance also indicates the other attributes of that type: numeric for "integer"
, listness for "list"
, etc. In particular, vec_is_list()
is only true for objects that explictly inherit from "list"
.
We can still proxy lists so that they have vectorised attributes. But unlike record vectors, we don't allow proxied storage. We require the main data structure to be the list storage, the proxy just provides additional vectorised attributes.
As a consequence, [[
-based operations don't need to take the proxy. Only [
-based operations need the proxy (and probably need to restore afterwards). In practice, this means that we don't take the proxy when we work on elements of the list. When we work on the list itself, we take the proxy and use generic vctrs operations (because the proxy might be a data frame).
Regarding how to implement [[<-
and $<-
operations in vctrs we could do it in the following way.
For x[[1]] <- y
with x
a list, we just do vec_slice(x, 1) <- list(y)
:
list(y)
to x
. This allows the class of x
to initialise vectorised attributes for y
.vec_slice<-
as usualIt's going to be inefficient to update lists elements by elements, so it's always preferable to vectorise the assignment.
Regarding a helper class for record lists and record data frames, I think vctrs::new_rcrd()
should gain a data = NULL
argument. When supplied, the fields would be stored in an attribute.
[[
and [[<-
as outlined in https://github.com/r-lib/vctrs/issues/1208#issuecomment-669367413.
@lionel- and I discussed the concept of a list-rcrd in more detail. Essentially this is a list that has some vectorized attributes that can get sliced alongside the list. We determined that one possible implementation of this idea is (in my own words):
The list itself is used as the core data structure.
The special vectorized attributes are stored as an attribute of the list. The easiest thing to do would be to store them as a named list in a single attribute called something like
vctrs:::vectorized
.The list explicitly inherits from
"list"
, along with whatever extra class is used on top of that. Say itsvctrs_list_rcrd
.The
vec_proxy.vctrs_list_rcrd
method should return a data frame. The first column is a list-col containing the list. The remaining columns are the vectorized attributes.This would currently work in vctrs, except for the fact that
vec_is_list()
has the following check: Ifx
inherits explicitly from"list"
, thenvec_proxy(x)
must also be a list. This should currently fail here (although I think it doesn't because of a bug).I don't remember exactly why we have that restriction, but it seems reasonable that we might try to remove it to allow these list-rcrds to work.
One possible implementation follows, but as @lionel- mentioned to me, the important part is that this doesn't require any vctrs specific helper, and can live completely outside vctrs.
Created on 2020-08-05 by the reprex package (v0.3.0)