"dataset preview" that is the head of a dataset, plus a sampling of the body
We want datasets to be as big as they need to be, where a 1TB dataset shouldn't be anything other than a storage space problem. But we also need fast, shorthand ways of describing datasets that move around easily.
Dataset Previews should aim to produce a size-bounded version of a dataset. The hard part is getting sampling right. This could start as just the dataset head + the first number of entries until the body becomes a certain size, and having it be possible that no preview is available if even one row exceeds that minimum size.
This RFC should also try to articulate all of the size variations of a dataset:
https://github.com/qri-io/rfcs/pull/14#discussion_r210939011 articulates our need for "dataset previews":
We want datasets to be as big as they need to be, where a 1TB dataset shouldn't be anything other than a storage space problem. But we also need fast, shorthand ways of describing datasets that move around easily.
Dataset Previews should aim to produce a size-bounded version of a dataset. The hard part is getting sampling right. This could start as just the dataset head + the first number of entries until the body becomes a certain size, and having it be possible that no preview is available if even one row exceeds that minimum size.
This RFC should also try to articulate all of the size variations of a dataset: