terraform-aws-modules / terraform-aws-ecs

Terraform module to create AWS ECS resources ๐Ÿ‡บ๐Ÿ‡ฆ
https://registry.terraform.io/modules/terraform-aws-modules/ecs/aws
Apache License 2.0
573 stars 542 forks source link

Service module defaulting user to root #221

Open thenom opened 3 months ago

thenom commented 3 months ago

Description

The container definition sub module is configured to set the user to null if not specified as it should. When using the container definition submodule in the service sub module, it gets set to 0 when not supplied.

https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/3b70e1e46e1b96a2da7fbfe6e2c11d44009607f1/modules/service/main.tf#L573C3-L573C93

user                     = try(each.value.user, var.container_definition_defaults.user, 0)

This issue is listed and i totally agree with the original posters comment by @jackylamhk: https://github.com/terraform-aws-modules/terraform-aws-ecs/pull/190#issuecomment-2080202995

I also spent too long trying to find out why my container was failing when moving to this module due to this issue. Defaulting to root (especially when un-documented) is surely a security issue.

I am posting this again as i noticed that the original one was closed/locked which means this it drops off the radar. I get the breaking change but surely security should be the priority and be cause enough for this to stay on the radar.

bryantbiggs commented 3 months ago

And what value should it be set to knowing that not setting a value causes conflicts with the ECS APIs?

thenom commented 3 months ago

The AWS documentation states that the user property of a container definition is not required: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerDefinition.html

"The user to use inside the container. This parameter maps to User in the docker container create command and the --user option to docker run."

This suggests that if it does map to the docker --user then if not provided then the User specified in the image is used. Are you saying that the problem is that the service module always provides user so you need a value? Surely if its null then it just doesnt get provided at all or am i missing something?

Out of interest, what conflict do you get when its not set? To be honest, this is the first time i have encountered the user property in a task definition as i have always driven it with the dockerfile.

bryantbiggs commented 3 months ago

the issue is not within this module, but a conflict between the ECS API and the Terraform AWS provider. You can see here for more details https://github.com/hashicorp/terraform-provider-aws/issues/11526#issuecomment-1202921276

While I agree that using the root user is far from ideal - it then becomes a question of what value do you use? We cannot simply guess at a UID/GUID and "hope" that works - the root user tends to work most often because of its permissions and therefore a safer default value to ensure things "just work" in the majority of cases. This then puts the onus on users to explicitly set a value that works for their container that is not the root user ID

jackylamhk commented 3 months ago

Setting the user to null also works and lets ECS use the user set in the Dockerfile. If the value 0 is needed to work around the issue mentioned, I would suggest including this in the documentation so new users are warned about this default behavior.

For example:

Note: the default user is set to 0 as a workaround for an upstream issue. If you want to change this behavior, set the user to null.

Happy to create a new PR if it looks good to you.

bryantbiggs commented 3 months ago

Happy to create a new PR if it looks good to you.

Thank you but no thank you - if folks aren't reading their Terraform plan output, why do we think they'll read the obscure notes we put in the repo docs ๐Ÿ˜…

thenom commented 3 months ago

As a new greenfield deploy there is a lot to take in during a plan which could have (and obviously has a few times) been missed. It wouldn't do any harm to highlight this in the docs as a someone doing a new setup would be referencing them. Not sure why the offer to update them for you is not wanted, is there something i am missing here that would be a negative?

bryantbiggs commented 3 months ago

Not sure why the offer to update them for you is not wanted, is there something i am missing here that would be a negative?

Its not a negative, I just don't see any value in that one line blurb. For example, we have this which is under an explicit docs/ directory and yet users fail to read it:

I am all for improvements but we have been down this path and it hasn't yielded the results expected so why would we go down that same path? if there is a different way of making this information more apparent, or in this case, a different way to resolve this specific issue around diffs with the API and needing to set a default of the root users - I am all ears. In fact, if anyone wants to submit a pull request, I would really really REALLY appreciate if you can submit one on the provider that removes this issue so we can set null as default and things "just work" as one would expect. That would be very useful

thenom commented 3 months ago

Ok, I still think a note on the user variable would be worthwhile but I do like a golang challenge and will take a look at the provider ๐Ÿ˜›

psantus commented 2 months ago

Using root is not such a big issue if you stick to read-only file system

gmeligio commented 2 months ago

Hi. Thank you for maintaining this module and caring for improvements. I hope this gives a new perspective.

TLDR

The default should be null:

user                     = try(each.value.user, var.container_definition_defaults.user, null)

Context

I was in the process of creating a PR now because I've been using my fork in production for the past month and it's been working OK. The only change needed for this issue is that line mentioned in the issue description.

I was drafting a PR today because I saw a month ago this past issue https://github.com/terraform-aws-modules/terraform-aws-ecs/issues/102 that was closed without resolution by the github-actions bot because it was stale. But when creating the PR and adding the description, I noticed this new issue and wanted your feedback.

I'm happy to discuss this further and try to help because I would like to continue using this great module.

Longer explanation for the module

The default should be null, causing AWS ECS to pass null to the container runtime and it will either resolve the the USER directive passed in the image or to root if it's not specified. So, essentially, the null value keeps compatibility with the ECS API. From my point of view, it's not the responsibility of this module nor Terraform to guess the container user. The possible values would be like this depending on the use case:

I quote two explanations from the AWS and Docker documentation below.

AWS ECS "Run containers as a non-root user"

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-tasks-containers.html#security-tasks-containers-recommendations-run-non-root-users

You should run containers as a non-root user. By default, containers run as the root user unless the USER directive is included in your Dockerfile. The default Linux capabilities that are assigned by Docker restrict the actions that can be run as root, but only marginally. For example, a container running as root is still not allowed to access devices.

Docker blog "Understanding the Docker USER Instruction"

https://www.docker.com/blog/understanding-the-docker-user-instruction

Remember, if you donโ€™t set a USER in your Dockerfile, the user will default to root. Always explicitly set a user, even if itโ€™s just to make it clear who the container will run as.

bryantbiggs commented 2 months ago

hi @gmeligio - tl;dr - its an upstream issue, not a module issue. we're talking about things that are inconsequential unless its fixed at the source

gmeligio commented 2 months ago

But the module sets the root (value 0) by default, changing how the upstream services would work by default (USER directive). Shouldn't the module have sensible defaults without changing upstream behaviour? The default root (value 0) it's currently modifying the security best practices of non-root user, which are supported by the upstream services if we pass null.

I'm not following what's the upstream issue. Could you elaborate a bit more?

gmeligio commented 2 months ago

Sorry for not reading your specific comment. To me it's not the same issue as this one.

the issue is not within this module, but a conflict between the ECS API and the Terraform AWS provider. You can see here for more details hashicorp/terraform-provider-aws#11526 (comment)

Regarding this upstream issue you mentioned, there is a comment that says

Still an issue with 4.63.0. Setting values to the defaults or null helps.

I might be missing something and it could be related. Anyways, the workaround of passing null doesnt' work currently in this module. I tried that but it keeps forcing the root user (value 0) because the try function in Terraform fails if it finds null, unless is the last argument. Maybe using coalesce could work in this case.

gmeligio commented 2 months ago

I was analyzing more the following links:

After analyzing a bit more, I think I understand now why the change was introduced in https://github.com/terraform-aws-modules/terraform-aws-ecs/pull/101, but maybe it was unintended to force root and not allow the default ECS behaviour, and a way of passing user = null was expected to work.

The tradeoff between the options I see is:

  1. Option 1 Security best practice: Having a default user = null that is compatible with upstream ECS, which will cause permanent diffs and replacement of the task definition, because of the upstream issue https://github.com/hashicorp/terraform-provider-aws/issues/11526.
  2. Option 2 Predictable plan: Having default user = 0 will keep the plans happy by default and won't replace the task definition. But it will behave differently than ECS, enforcing the root user and not allowing to pass null to use the default ECS behavior because of the use of the try function.

For my use case, I always replace the task definition in the CI/CD pipeline because the image is rebuilt in each pipeline run, so I didn't experience the upstream issue. That's why I considered only Option 1. Maybe applying Option 2 is the best for the users of this module, and I'm OK with that. I would only need a workaround for setting user = null, until the upstream issue is resolved as you mentioned. I've tried some alternatives without success so far.

Does someone know a workaround when consuming this module to pass user = null to use the USER directive from the image and bypass the enforcement of root in the container definition in this line?

user                     = try(each.value.user, var.container_definition_defaults.user, 0)
github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

jackylamhk commented 1 month ago

bump

richardgavel-ordinaryexperts commented 1 month ago

An additional issue is that the variable documentation for var.user no longer matches the actual module execution:

The user to run as inside the container. Can be any of these formats: user, user:group, uid, uid:gid, user:gid, uid:group. The default (null) will use the container's configured USER directive or root if not set

bryantbiggs commented 6 days ago

in v6.0 we will be reverting the root user back to null - any diff issues *MUST be reported to the upstream provider - there is nothing we can do for that here in this module