Open gmbecker opened 9 years ago
I like this idea too and it has been in the back of my mind for a while. I seem to remember there being an effort to implement it a long time ago, but I think there was a hiccup with respect to preserving references shared across objects.
We need it badly, hope it gets selected!
I'm very interested in this as well. I suppose there is a question about what type of data you are thinking of. If you are thinking large data frames and you want random access to rows, that's one thing.
What I'm particularly interested in is storing large lists of arbitrary objects as key-value pairs and having random access by key, with a serverless solution. R is lacking sqlite-equivalent support for this case. There was a berkelydb R package that seems to be abandoned that would achieve this, but there are a lot of other similar technologies that it would be great if R could support, such as
Anyone following this ooold thread; I have started work on indexing rds here
The corpus
package by @patperry has an interesting implementation of memory mapping strings within json objects. It's one of the best working examples of accessing data within an object on disk without loading the entire thing in memory.
Serialized R objects are everywhere, from cluttering our workspaces to provided package data. Currently, however, such objects are "all or nothing", in that to get any piece of the saved object, or to even determine what objects are saved in a particular rda/RData file, we have to load the whole thing into memory.
It would be nice to have a serialization format amenable to to inspection and "random" - in the access sense - subset retrieval.
Packages such as bigmemory offer something like this for matrices, but I'm talking about a general solution which could act as a swap-in replacement for save().
Self-describing data formats such as Avro https://avro.apache.org/ and some form of external indexing akin to tabix are two approaches that seem promising. Packages such as BigMemory