r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
289 stars 66 forks source link

what *is* a vector? #1955

Closed JosiahParry closed 1 month ago

JosiahParry commented 1 month ago

This is somewhat of a philosophical question but with real consequences—so I apologize for it winding and curving!

TL;DR

The nb class is a list with attributes and no explicit list class.

This is how the following packages see it

package list vector
rlang
base
vctrs

Background

One thing that has been bothering me since 2021 is that the nb and listw classes from the spdep cannot be easily integrated into the tidyverse.

The nb class object is a ragged array stored in a list. A list is a vector and thus can work with vctrs and the tidyverse in general. However, the nb class object does not have the list class explicitly added. There is disagreement across base R, rlang, and vctrs about what constitutes a vector and a list.

Motivation

The rcrd class from vctrs provides a nice opportunity to be able to embed the listw class into the tidyverse workflow in a much more seamless way than has been possible in the past.

I am quite interested in thinking through how I can make spatial statistics more accessible to the R ecosystem and this is a big part of it. I have a package sfdep which provides tidyverse compatibility by way of partitioning these two component lists neighbours and weights as two separate columns in a dataframe. Ideally, it would be one as it can become out of sync.

Question

What constitutes a list and a vector in vctrs and should there be agreement between rlang and vctrs as to what this is?

Additionally, do you all have guidance as how one can address this? FWIW, I am not the author or maintainer of {spdep} and adding the list subclass is out of question as demonstrated in https://github.com/r-spatial/spdep/issues/59.

Reprex

library(spdep)
library(vctrs)

# create listw object 
nb <- cell2nb(10, 10)
listw <- nb2listw(nb)

# try and create a record
x <- new_rcrd(listw, class = "swm_rcrd")
#> Error in `df_list()`:
#> ! `neighbours` must be a vector, not a <nb> object.

# according to {rlang} the nb object is a vector
rlang::is_list(listw$neighbours)
#> [1] TRUE
rlang::is_vector(listw$neighbours)
#> [1] TRUE
# according to vctrs it is not a list
vctrs::obj_is_list(listw$neighbours)
#> [1] FALSE
# according to vctrs it is not
vctrs::obj_is_vector(listw$neighbours)
#> [1] FALSE

# base R says it is a list
typeof(listw$neighbours)
#> [1] "list"

# but base R also says it is not a vector
# is this because it is missing the explicit class??
is.vector(nb)
#> [1] FALSE

# according to base R it is _not_ a vector
is.vector(list())
#> [1] TRUE

# adding the explicit list class 
class(listw$neighbours) <- c("nb", "list")

# this works 
x <- new_rcrd(listw, class = "swm_rcrd")

format.swm_rcrd <- function(x, ...) {
  nbs <- field(x, "neighbours")
  card <- spdep::card(nbs)
  out <- paste("(", vapply(nbs, toString, character(1)), ")", sep = "")
  out[which(card == 0)] <- NA
  out
}

tibble::tibble(swm = x)
#> # A tibble: 100 × 1
#>            swm
#>     <swm_rcrd>
#>  1     (2, 11)
#>  2  (1, 3, 12)
#>  3  (2, 4, 13)
#>  4  (3, 5, 14)
#>  5  (4, 6, 15)
#>  6  (5, 7, 16)
#>  7  (6, 8, 17)
#>  8  (7, 9, 18)
#>  9 (8, 10, 19)
#> 10     (9, 20)
#> # ℹ 90 more rows

Created on 2024-10-22 with reprex v2.1.0

lionel- commented 1 month ago

There is the storage type and there is the semantic type (a combination of interface and semantics). rlang is about the storage type, vctrs is about semantics.

We've decided that S3 subclasses must explicitly inherit from a base vector/list class to be considered as such, even if they have vector/list storage. For instance, in the vctrs worldview an S3 model is a scalar and not a list, even though it has list storage.

DavisVaughan commented 1 month ago

FWIW is.vector() is incredibly low level and is probably not a good thing to consider in this conversation:

is.vector(x) returns TRUE if x is a vector of the specified mode having no attributes other than names.

DavisVaughan commented 1 month ago

?vctrs::obj_is_list() does a good job explaining the 2 rules that allow an object to be treated as a list in vctrs, x is a list if:

As Lionel said, this distinction allows us to say that output from lm() is considered a scalar object rather than a vector-like list object. Because its class is just "lm".

But a vctrs::list_of() is considered a vector-like list, because its class structure is c("vctrs_list_of", "vctrs_vctr", "list")


This rule about what an explicit "list" class means runs very deep. If you have a "list" class on your object, we are going to try and index into it with VECTOR_ELT() or VECTOR_PTR_RO() at the C level, so it sure better be backed by a VECSXP.

DavisVaughan commented 1 month ago

?vctrs::obj_is_vector() similarly does a good job of describing what makes an object a vector in vctrs https://vctrs.r-lib.org/reference/vector-checks.html#vectors-and-scalars

In particular, a good example here is the vctrs_rcrd type.

Another good example are the Duration and Interval and Period S4 classes from lubridate:

JosiahParry commented 1 month ago

Thank you all for the very clear and thoughtful responses! Following the details section in Vector Checks (which should be more discoverable, imo its really great writing!) this issue can be addressed by simply adding a new vec_proxy() method.

Overall what I take away is that the comparison between rlang and vctrs should be between the _bare_ functions in rlang. vctrs permits vector "status" to be obtained through other s3 generic methods (notably vec_proxy()).

library(spdep)
library(vctrs)

# create listw object 
nb <- cell2nb(10, 10)
listw <- nb2listw(nb)

# these tests should be the same
rlang::is_bare_list(nb)
#> [1] FALSE

rlang::is_bare_vector(nb)
#> [1] FALSE

vctrs::obj_is_list(nb)
#> [1] FALSE

vctrs::obj_is_vector(nb)
#> [1] FALSE

# tell {vctrs} that nb _is_ a vector 
vec_proxy.nb <- function(x, ...) {
  unclass(x)
}

# do these tests with {vctrs} again and see it is now vector
# but still not list
vctrs::obj_is_list(nb)
#> [1] FALSE

vctrs::obj_is_vector(nb)
#> [1] TRUE

# give a format method for the record
format.swm_rcrd <- function(x, ...) {
  nbs <- field(x, "neighbours")
  card <- spdep::card(nbs)
  out <- paste("(", vapply(nbs, toString, character(1)), ")", sep = "")
  out[which(card == 0)] <- NA
  out
}

# try and create a record
x <- new_rcrd(listw, class = "swm_rcrd")

head(x)
#> <swm_rcrd[6]>
#> [1] (2, 11)    (1, 3, 12) (2, 4, 13) (3, 5, 14) (4, 6, 15) (5, 7, 16)

Created on 2024-10-23 with reprex v2.1.0