traversc / qs

Quick serialization of R objects
397 stars 19 forks source link

feature request: add meta data to qs file #66

Closed statquant closed 5 months ago

statquant commented 2 years ago

Hello, FIrst, thanks again for the amazing package. I have a feature request that I hope you'll find useful: I think that would be extremely useful to be able to add meta data to a qs file. For instance say you save a table with say a column date, it could be useful to know the date range saved in the table. In everything that involves caching (and I think everyone caching stuff is using qs or fst) it comes super handy. fst offers some flavor of it (typically column classes), the way I see it a function like the below would do the trick, and by default simple meta data could be attached (column type, nrow, ncol for data.frame (and inherits), ...)

add_metadata <- function(path, data) {
 # attach data to qs file 
}

Many thanks

tdeenes commented 2 years ago

Note that you can serialize any R object with qs, not only data.frame-like structures. So one should rather implement a generic in qs which then dispatches based n the class of the object to be serialized. Something like:

add_metadata <- function(...) UseMethod("add_metadata")
add_metadata.data.frame <- function(x, ...) {
  # prepare an object which will be serialized by qs and attached to the original object
}

One could already do this outside of qs, but the whole purpose of this feature would be to be able to retrieve metadata of a serialized object efficiently. So for example creating a list like list(metadata = <metadata>, data = <data>) and serializing it does not help here because in the current implementation of qs, we do not have random access, e.g. we can not deserialize the metadata slot of the serialized object without deserializing the whole object.

@traversc If you could implement this it would be an amazing feature.

traversc commented 2 years ago

I would be opposed to building in metadata directly to the qs file format, because I see it as more of a convenience feature (since you could simply store metadata externally) and not something core. It adds complexity, backwards compatibility issue and the possibility of additional bugs.

However, if it's a pure R convenience function written on top qs somehow, that would be a possibility. Can you expand on how that would look like? If you add metadata, where would it be stored?