ten-nancy / porto

Porto is a yet another Linux container management system.
Other
27 stars 2 forks source link

Question: StartParents when creating child #12

Closed AlexIvchenko closed 4 weeks ago

AlexIvchenko commented 3 months ago

Hello!

I have YT exec node which is powered by porto. I want to stop it and I send stop command via RPC. porto log:

00:45:36.247 portod-RW15[223743][02938940]: REQ Stop np1/<exec-node>/run  timeout=10 ms from CL62:java(214749) CT1:/
00:45:36.280 portod-RW8[223736][344d9cd9]: REQ CreateFromSpec run/jm/s_183/jp-28f087c5-a37554eb container { name: "run/jm/s_183/jp-28f087c5-a37554eb" command: "\'/usr/bin/ytserver-job-proxy\'...name: "PORTO_NAME" value: "np1/<exec-node>/run"
00:45:39.701 portod-RW15[223743][02938940]: RSP Stop np1/<exec-node>/run  timeout=10 ms Ok to CL62:java(214749) CT1:/ lock=6 ms time=0+3454 ms
00:45:39.701 portod-EV0[223768]: Long lock LockAction operation time=3454 ms
00:45:39.703 portod-RW8[223736][344d9cd9]: ACT Start CT3692:np1/<exec-node>/run
  1. Porto receives stop YT exec-node command and acquires lock
  2. Porto starts to terminate children containers first, so that YT exec node is not aware of stopping may create subcontainers
  3. Porto receives CreateFromSpec request to create ytserver-job-proxy from YT exec node and waits for lock acquired by stop process
  4. Porto stopped YT exec-node and releases lock
  5. Porto CreateFromSpec request acquires lock
  6. Porto starts parents including previously stopped YT exec node

I think there are two issues:

  1. Porto stops subtree from child to parent (which allows to create subcontainers during stop)
  2. Porto starts parent containers

Questions

Why porto starts parents during creation of child container instead of throwing an error?

As I remember YT can spawn jobs by several ways:

  1. by creating subprocesses
  2. by creating CRI container
  3. by creating nested container in porto

1 and 2 doesn't lead to starting exec node again, but using porto does.

Why porto stops containers from child to parent?

It seems to me that SIGTERM should be send from parent to children for gracefulness.

PORTO-1010?

I also see PORTO-1010 issue in comments which prohibits starting containers if client is portod. Can you please explain what is this ticket about, I think it can be related to my question?

Thank you in advance

frostoov commented 2 months ago

Yes, starting container with stopped parent is not a good idea.

https://github.com/ten-nancy/porto/commit/f8f513f9e69f61819d1f65043cb6b95ce21fbea0 added config option to disable such behavior:

/etc/default/portod.conf:
...
container {
    enable_start_parents: false
}
...
AlexIvchenko commented 2 months ago

@frostoov thank you Could you please also clarify about graceful stop of containers' tree Usually, graceful shutdown is done from dependent to dependencies e.g. if parent depends on child (parent -> child) then parent is stopped beforehand. If dependency child is gone but parent is still unaware of shutdown and continues to accept requests than all of these requests are failed because of child unavailability (actor -> parent -x->child).

Because of that, it seems to me, that SIGTERM should be send to parent beforehand and only if parent didn't stop child by itself then it's done by porto by stoping orphaned child. Does it make sense?

frostoov commented 1 month ago

Usually, graceful shutdown is done from dependent to dependencies e.g. if parent depends on child

Child definitely depends on parent — there can't be a child without a parent.

Because of that, it seems to me, that SIGTERM should be send to parent beforehand

This approach leads to violation of following invariant: parent of running container cannot be in stopped state.

Because of that, it seems to me, that SIGTERM should be send to parent beforehand

You can use Kill(parent_container, SIGTERM) api call to achieve such behaivor