just-containers / s6-overlay

s6 overlay for containers (includes execline, s6-linux-utils & a custom init)
Other
3.73k stars 212 forks source link

Using s6-overlay with ECS Exec #559

Closed humphd closed 1 month ago

humphd commented 10 months ago

I'm using s6-overlay in containers that I'm running on AWS ECS. I was hoping to use ECS Exec for remote debugging in the hosted containers, but I can't get my containers to start when this is enabled.

I suspect that ECS Exec wants to own pid 1, which is breaking s6-overlay, but I wanted to see if anyone had recommendations on how to get these to play nicely together.

Thanks for s6-overlay, it's been amazing for us!

skarnet commented 10 months ago

From a cursory reading of the page, I have the same hunch as you, that ECS Exec wants to own pid 1.

This kind of software really needs to stop pretending it provides a container environment when it does not. This is aggravating.

humphd commented 10 months ago

Looking into this a bit more, I wonder if I my Terraform ECS module is causing this I can override it to not happen:

InitProcessEnabled

Run an init process inside the container that forwards signals and reaps processes. This parameter maps to the --init option to docker run. This parameter requires version 1.25 of the Docker Remote API or greater on your container instance. To check the Docker Remote API version on your container instance, log in to your container instance and run the following command: sudo docker version --format '{{.Server.APIVersion}}'

Required: No

Type: Boolean

Update requires: Replacement

phpmathan commented 7 months ago

I'm also facing the same issue with ECS using EC2 deployment. I'm using Pulumi for the Infra/Stack management and using the following settings in the definition of the task.

containerDefinitions: {
    linuxParameters: {
        initProcessEnabled: true
    }
}

I need to enable this ECS execute command, because I need to execute some commands, and restart S6 services (like s6-svc -r /run/service/serviceName) based on some triggers.

trying to figure out the solution for the same, if anyone has a solution pls post it here.

Thanks

skarnet commented 7 months ago

s6-overlay relies on being pid 1 for your container. It cannot work, and will never work, in a situation where another pid 1 is provided by the so-called container manager. Sorry.

If you need to run early commands that don't fit with the s6-overlay model, the only suitable place is S6_STAGE2_HOOK. If you need to restart services depending on triggers, you can always define other services that listen to triggers and send s6-svc -r commands to services you choose.

phpmathan commented 7 months ago

@skarnet - Thanks for the confirmation and the suggestion.

I wrote a Go program to listen in TCP and execute the payload as/from whitelisted system commands, and this listener runs as another S6 service in the container.

skarnet commented 7 months ago

Sounds good!

Just in case, know that the s6 suite provides most of the infrastructure for this:

mustanggb commented 6 months ago

Not sure if this is helpful or not, but I've started using ECS Exec in the last month without issue.

So far only tried on Fargate without setting anything for initProcessEnabled (does it default to false?).

Maybe AWS changed something, or maybe something is different with your setup, but just wanted to let you know it does seem possible to make it work with s6-overlay out of the box.

skarnet commented 6 months ago

If it works, then it is almost certainly that initProcessEnabled is false.

skarnet commented 3 months ago

Can I close this issue?

oliparcol commented 1 month ago

I just created a new cluster with fargate as capacity and initProcessEnabled disabled and I'm also getting the s6-overlay-suexec: fatal: can only run as pid 1 error.

I investigated a little bit and pid 1 is taken by /pause process which comes from the amazon-ecs-pause container that is used by AWS ECS networking (https://aws.amazon.com/blogs/compute/under-the-hood-task-networking-for-amazon-ecs/). For each task launched on ECS, it seems that a pause container is launched and the pid namespace is shared between that pause container and the one running the task.

EDIT sorry this is my bad, I misread AWS documentation (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#task_definition_pidmode) which states On Fargate for Linux containers, the only valid value is task. but this is also possible not to specify this value. By removing it, everything seems to work as expected.

skarnet commented 1 month ago

Glad you made it work!