Closed hadley closed 1 year ago
I'm going to leave this for now — it only affects a few less important arguments, and it would add a substantial amount of boilerplate.
I was wondering if there was a way to automatically add metadata when writing a pin, e.g. based on the class of the pinned object.
For example, I work a lot with DGEList
objects from the edgeR package. These objects contain multiple slots, including information about the features (genes) and columns (samples). It would be great if e.g. the names of sample annotation data.frame
could somehow be added to the pins' metadata.
Here is an example of what I have in mind, based on the edgeR::DGEList
example:
ngenes <- 1000
nsamples <- 4
Counts <- matrix(rnbinom(ngenes*nsamples,mu=5,size=2),ngenes,nsamples)
rownames(Counts) <- paste0("gene", 1:ngenes)
x <- DGEList(
counts=Counts,
group=rep(1:2,each=2),
# adding two sample annotation columns, which could be used e.g. as covariates
samples = data.frame(age = sample(100, 4), weight = sample(60:80, 4))
)
Right now, I would need to compile metadata myself before calling pin_write
:
metadata <- c(
list(
# just examples of metadata that could be added automatically
dimensions = list(features = nrow(x), samples = ncol(x)),
sample_anno = colnames(x$samples),
features = head(row.names(x), 5)
)
)
pin_write(x, "my_DGEList", metadata = metadata)
The legacy API used S3 generics for the pins method and implemented different methods e.g. for data.frames
.
That enabled users to add additional S3 methods themselves, e.g. a pin.DGEList
S3 method that would automatically extract metadata from the object and add it to the metadata list before calling the pin()
generic. Similarly, the pin_preview()
method could be extended to display the additional metadata in a pretty way.
Do you have advice on how to extend the functionality of pins in that way with the new API? Is that something you have considered already?
Hey--what would the final storage format (type) be in this example? Rds?
I think making pin_write extensible is useful, but for now it seems like wrapping pin_write is the way to go.
Here is how vetiver does it: https://github.com/rstudio/vetiver-r/blob/main/R/pin-read-write.R#L35
One advantage to this approach is vetiver isn't necessary to read the pin (but it is necessary to go from pin to vetiver object)
Yes, the storage format would be Rds (or qs). Thanks a ton for the pointer to vetiver's wrapper code. That's super helpful, and I will explore that direction.
I don't think we should make pin_write()
(or pin_read()
?) generics for now, but let's maybe add some documentation somewhere to point out the possibility of folks writing functions to wrap pin_read()
/pin_write()
for customized use cases. Maybe it should be another section of our new "custom formats" vignette?
Check out the new vignette we now have on customized/standardized metadata.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
e.g.
Then
board_rsconnect()
can document its methods:Need to consider whether or not this additional layer is worth the effort.