liftbridge-io / liftbridge

Lightweight, fault-tolerant message streams.
https://liftbridge.io
Apache License 2.0
2.58k stars 107 forks source link

Transparent offload to object storage #110

Open bruth opened 5 years ago

bruth commented 5 years ago

Pulsar and Gazette both support this feature. Closed, immutable segments are copied to object storage so brokers only need to keep the open segments on local disk. When a consumer needs to read older segments the broker transparently reads from object storage.

Obviously not a pressing feature until storage scale becomes an issue, but just wanted to create an issue to discuss.

tylertreat commented 5 years ago

Interesting, I imagine archiving would somehow be based on retention policy? Thinking through how data gets offloaded and how it gets reloaded by consumers.

bruth commented 5 years ago

I believe both systems decouple those two concepts, segment size vs. retention period. Once the max segment size has been reached on the broker, it is closed and a new segment is created. The closed segment is then copied to object storage and freed up on the broker. A bit of metadata is maintained in the broker to fetch the segment from object storage rather than local disk when needed in a consumer stream.

Retention would likely be another bit of metadata which stores the earliest offset that is still retained. Any whole segments prior to that offsets can be deleted from storage.

I believe Pulsar allows for time-based segments as well and potentially on-demand segmentation via the admin API. Gazette goes as far as offloading fetching the segments directly from object storage by the client libraries via signed URLs (so the data doesn't go through the broker).

tylertreat commented 5 years ago

Once the max segment size has been reached on the broker, it is closed and a new segment is created. The closed segment is then copied to object storage and freed up on the broker.

This would have some pretty serious performance implications for consumers if we offload a segment every time it's sealed, particularly if segment.max.bytes is small. I would think we would want to have some sort of configurable "tail" of segments that are retained before offloading to object storage?

I believe Pulsar allows for time-based segments as well

Liftbridge also does this (see log.roll.time).

bruth commented 5 years ago

I would think we would want to have some sort of configurable "tail" of segments that are retained before offloading to object storage?

Yes that makes sense, my example was definitely the extreme case.

tylertreat commented 5 years ago

FYI, looks like Kafka is considering this feature as well. KIP for reference: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage

gedw99 commented 4 years ago

Proxima golang project has this with NATS. They call it freezer or something.

github.com/uw-labs/proximo

gedw99 commented 4 years ago

Here is Proxima Freezer: https://github.com/uw-labs/substrate/blob/master/go.mod#L19

nphard commented 4 years ago

would be great if could integrated with minio. we could offload the clients to minio directly if just for some replay to machine learning training.