Closed dominiklohmann closed 8 months ago
It makes sense for continuous exports to operate at the partition level. I would like to better understand the delineation to similar feature that doesn't involve storage of data at the node at all, similar to publish
and subscribe
. These operators merely emit a stream of record batches, and I think this is a valuable abstraction to have at the node level. It might even be easier to implement, as it could be exactly reusing the operators for a publishing to a single node channel and subscribing to a node. However, there is semantic overlap with import
and export
since they convey "ingest into a node" and "start from persisted data at a node".
Any thoughts on this?
I agree this would be nice to have, but we need to carefully consider this with the additional implementation effort in mind. The default active partition timeout is just 30 seconds, so the gain here is relatively small.
Iff we decide to not implement a direct channel between the export and the import
operators for newly ingested data, then we need to carefully make sure that we do not introduce a gap in the stream of events. That is, any yet unpersisted event that already left the import
operator must be also be forwarded to the export
operator directly. With the current implementation of the two operators this is not easily possible.
Preparing a demo today I noticed that this low-effort item would've really helped with the demo setup. It's the easiest way to demonstrate node-local storage and working with it effectively.
The
export
operator currently only supports querying historical events. We want to implement an option for the operator that makes it also load partitions that were newly added to the catalog.