risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
7.03k stars 578 forks source link

Tracking: Bummock as an indexing storage for various data models. #585

Closed wyhyhyhyh closed 2 years ago

wyhyhyhyh commented 2 years ago
  1. Relational Bummock will serve as a unified indexing interface for table storage built on top of Hummock. See https://singularity-data.quip.com/lYozAfUcsvXV/Bummock-as-an-indexing-storage for design doc.

A roadmap includes:

  1. Nested (protobuf, json) We support nested data model an indexing interface. https://singularity-data.quip.com/63ipA5OTEqnD/Considerations-on-Schemaless-Bummock

A brief roadmap:

Long term:

neverchanje commented 2 years ago

I think this issue mostly overlaps with https://github.com/singularity-data/risingwave-dev/issues/515 so maybe you should only include those that are not assigned here.

wyhyhyhyh commented 2 years ago

I think this issue mostly overlaps with #515 so maybe you should only include those that are not assigned here.

Yes, this tracking issue focuses on storage.

fuyufjh commented 2 years ago

We support protobuf data model as a separate storage engine in Bummock.

This description confuses me... From my understanding, the "Protobuf data model" is a data model and it must be supported in both Hummock and Bummock. In another word, "Protobuf data model" and "Bummock" are orthogonal topics for me... Is it correct?

wyhyhyhyh commented 2 years ago

We support protobuf data model as a separate storage engine in Bummock.

This description confuses me... From my understanding, the "Protobuf data model" is a data model and it must be supported in both Hummock and Bummock. In another word, "Protobuf data model" and "Bummock" are orthogonal topics for me... Is it correct?

In my understanding:

Bummock is a framework, both relational model and protobuf data model (nested relation) fit inside.

neverchanje commented 2 years ago

There's no such thing as protobuf data model btw, even google doesn't name it when they invented dremel. The most widely used word is "nested data model".

wyhyhyhyh commented 2 years ago

There's no such thing as protobuf data model btw, even google doesn't name it when they invented dremel. The most widely used word is "nested data model".

True. Thanks.

fuyufjh commented 2 years ago

In my understanding:

  • Hummock is the KV layer at the bottom serving unified storage engine.

  • Bummock is supposed to provide unified interface for collection access, e.g. scan, insert, delete etc. (Similar to the position of current ScannableTable, but an extension with more functionality and fine-grained interfaces). MViewTable, indexes, nested relation (so called protobuf data model) will all be wrapped up inside the Bummock.

Bummock is a framework, both relational model and protobuf data model (nested relation) fit inside.

You mean Bummock will be the storage interface for compute layer instead of Hummock? That's the first time for me to heard that😥

twocode commented 2 years ago

You mean Bummock will be the storage interface for compute layer instead of Hummock? That's the first time for me to heard that😥

Interface between compute layer and storage is always Table (unchanged from the first day table was introduced). Bummock will provide indexing interfaces to support Table.

wyhyhyhyh commented 2 years ago

In my understanding:

  • Hummock is the KV layer at the bottom serving unified storage engine.
  • Bummock is supposed to provide unified interface for collection access, e.g. scan, insert, delete etc. (Similar to the position of current ScannableTable, but an extension with more functionality and fine-grained interfaces). MViewTable, indexes, nested relation (so called protobuf data model) will all be wrapped up inside the Bummock.

Bummock is a framework, both relational model and protobuf data model (nested relation) fit inside.

You mean Bummock will be the storage interface for compute layer instead of Hummock? That's the first time for me to heard that😥

Yes, very tricky and confusing for me also.

wyhyhyhyh commented 2 years ago

Closed as the bummock design needs revision.