qri-io / rfcs

Request For Comments (RFCs) documenting changes to Qri
MIT License
12 stars 6 forks source link

RFC Request: Dataset Previews #16

Open b5 opened 6 years ago

b5 commented 6 years ago

https://github.com/qri-io/rfcs/pull/14#discussion_r210939011 articulates our need for "dataset previews":

"dataset preview" that is the head of a dataset, plus a sampling of the body

We want datasets to be as big as they need to be, where a 1TB dataset shouldn't be anything other than a storage space problem. But we also need fast, shorthand ways of describing datasets that move around easily.

Dataset Previews should aim to produce a size-bounded version of a dataset. The hard part is getting sampling right. This could start as just the dataset head + the first number of entries until the body becomes a certain size, and having it be possible that no preview is available if even one row exceeds that minimum size.

This RFC should also try to articulate all of the size variations of a dataset: