tenzir / public-roadmap

The public roadmap of Tenzir
https://docs.tenzir.com/roadmap
4 stars 0 forks source link

Continuous Exports #63

Closed dominiklohmann closed 8 months ago

dominiklohmann commented 12 months ago

The export operator currently only supports querying historical events. We want to implement an option for the operator that makes it also load partitions that were newly added to the catalog.

### Definition of Done
- [x] Agree on the new option's name (follow, live, continuous, …) -> `--live`
- [x] Agree on the desired protocol between the operator and the catalog.
- [x] Implement the changes.
mavam commented 10 months ago

It makes sense for continuous exports to operate at the partition level. I would like to better understand the delineation to similar feature that doesn't involve storage of data at the node at all, similar to publish and subscribe. These operators merely emit a stream of record batches, and I think this is a valuable abstraction to have at the node level. It might even be easier to implement, as it could be exactly reusing the operators for a publishing to a single node channel and subscribing to a node. However, there is semantic overlap with import and export since they convey "ingest into a node" and "start from persisted data at a node".

Any thoughts on this?

dominiklohmann commented 10 months ago

I agree this would be nice to have, but we need to carefully consider this with the additional implementation effort in mind. The default active partition timeout is just 30 seconds, so the gain here is relatively small.

Iff we decide to not implement a direct channel between the export and the import operators for newly ingested data, then we need to carefully make sure that we do not introduce a gap in the stream of events. That is, any yet unpersisted event that already left the import operator must be also be forwarded to the export operator directly. With the current implementation of the two operators this is not easily possible.

dominiklohmann commented 9 months ago

Preparing a demo today I noticed that this low-effort item would've really helped with the demo setup. It's the easiest way to demonstrate node-local storage and working with it effectively.