Open dominiklohmann opened 1 year ago
I think one big thing to keep in mind here is that this turns the tenzir
process essentially into a process manager. So we have to be careful about keeping track of all these forked processes we spawn, and how to clean them up again on shutdown.
Our friends at Zeek put a lot of energy into getting process supervision right. It may make sense to study the supervisor framework at a very high level to avoid re-experiencing all the weird POSIX gotchas.
We should also take a close look at a deeper cgroup integration of the node, after all this kind of scenario (ie. bounding memory/cpu usage of a group of processes and not losing track of them) is what they were invented for.
This came up again today when preparing for a demo. It's a real bummer to have a node crash because of a bug in a third-party library used in a connector. That sometimes is just out of our control.
On Wednesday we had a discussion round together with @dominiklohmann and @jachris .
We agreed on the high-level outlines of this feature:
local
for these pipelines refers to a newly spawned processRelated documents: https://docs.google.com/document/d/1b-zpDp796fRr1FPpObCkia2Dyuh8IEF-XaBmv5lvszs/edit#heading=h.um2utrvlnup8 https://app.excalidraw.com/s/6dBWEFf9h1l/8J1RozwXFXV
I attempted to write a proof of concept for this last night / this morning and got to a point where I can run a pipeline in a forked tenzir-node
process as a whole. The core behavior change of reducing the blast radius of a crashing pipeline would be fulfilled by that. But the majority of the work is yet to be done.
Notably:
Pipelines currently always run in the same process as the node when run through the API. This is a reliability problem, because a pipeline running out of memory also causes the node to go down, and with it all other pipelines.
This process is about decoupling the risk of running pipelines by running them in a child process instead.