varchar-io / nebula

A distributed block-based data storage and compute engine
https://nebula.bz
Apache License 2.0
154 stars 17 forks source link

Ingest sampling support #47

Open shawncao opened 3 years ago

shawncao commented 3 years ago

Some use case needs sampling support during data ingestion for some super heavy data source, users can get insights without scan full data.

a few initial thoughts

chenqin commented 3 years ago

there are more into sampling when it comes to data science and statistical (old name of ML) use cases. we can start from simple one pass algorithm.