RFC: Support disable checkpoint

fuyufjh commented 2 years ago

Background

In real-world use cases, not all streaming tasks require checkpoints/persisted state. For example, In most ETL use cases, users could simply recover from some offset in source MQ, so the persistence is actually guaranteed from the source side.

On the other hand, Hummock is still under heavy development, and it's hard for us to make sure the backward compatibility of persisted data at this stage. Thus I guess for early users they may not have other choices but only to use RisingWave as a "non-persisted" streaming database.

Design

RisingWave is designed to have persisted storage from day 1. Luckily, since we are working on barrier-checkpoint decoupling #4290, it's possible to support this within relatively few changes.

Here are my rough ideas. Correct me if anything is missing or wrong.

Never emits checkpoint barrier
Keeps all data in the shared buffer

Further Optimization

An optional optimization is to keep the state of streaming operator in operator cache only. Otherwise, all rows of internal state will be stored twice in memory. To achieve this,

The relational table can simply drop all writes and return None for all reads.
Eliminate the upper bound of operators' cache, so that full data will be cached in memory

Notes

This feature should not be exposed to end users especially the cloud users, but only for internal testing purpose. Perhaps we should name this option as unsafe_xxx to highlight it.
This feature is not compatible with scaling, which must depend on persistent shared storage. (Another reason for 'unsafe')

twocode commented 2 years ago

If the purpose is for compute team to benchmark streaming performance, I suggest we keep barriers since it's brings trivial costs anyway while keep the semantics at the same time. Keeping all data in shared buffer will not help with performance stability since it would potentially thrash the memory system.

If correctness matters and we want to cater to real customers without long tail persistence cost, we can use memory object store. If correctness is not main target, I suggest we implement a blackhole object store to swallow all io operations.

neverchanje commented 2 years ago

FYI @lmatz is working on a benchmark that aims at comparing the performance of Flink and RisingWave in terms of stateless computation. We will compare the CPU and memory consumption when each of the systems deals with a heavily loaded stream. In this case, the benchmark would be unfair since RisingWave materializes the result while Flink doesn't. So we are planning to hack a "blackhole MaterializeExecutor" that does nothing but return an OK(). This executor won't be merged into the main branch but is only for testing. So it would be great if we had a better solution.

lmatz commented 2 years ago

I suggest we keep barriers

I thought Never emits checkpoint barrier means that the system will keep freshness barriers, but will not emit checkpoint barriers that tell the system to the checkpoint. So probably the same thing? I want to be clarified on this issue.

fuyufjh commented 2 years ago

I thought Never emits checkpoint barrier means that the system will keep freshness barriers, but will not emit checkpoint barriers that tell the system to the checkpoint.

Yes, I think we are talking about the same thing.