Enable running Docker container as non-root user

schnerring commented 3 years ago

Let me know if I should rather open the issue in https://github.com/umputun/baseimage, I'm not sure if this belongs here.

I run remark42 on Kubernetes and currently try to reduce container privileges of my workloads. I can't get it to work with Remark42:

Run the container as user / group 1001 / 1001
Set the root filesystem to read only, so only mounted volumes are writable.

Currently this doesn't work since the init.sh script requires root privileges to work.

Would it be possible to implement it in such a way that we can run Docker containers as non-root?

paskal commented 3 years ago

It seems like the intention of init.sh is to work under non-root user as well, what error do you see while you run it as non-root? Also, I agree that the issue should be recreated in https://github.com/umputun/baseimage unless the problem is in remark42.

umputun commented 3 years ago

I don't think this is about running remark42 process itself as non-root, because this already in place and works as designed

docker exec -it remark42 ps -ef
PID   USER     TIME  COMMAND
    1 root      0:00 {init.sh} /sbin/dinit /bin/sh /init.sh /srv/remark42 serve
    6 app       0:05 /srv/remark42 server
 1988 root      0:00 ps -ef

I think the problem is the init process running under root. Technically this is not an issue by itself, but likely prevents from running remark42 container in something like podman or some other docker configuration with stripped root privileges.

schnerring commented 3 years ago

The logs when running as user / gid 1001 / non-root:

execute /srv/init.sh
chmod: /srv/init.sh: Operation not permitted
prepare environment
/srv/init.sh failed

Does this happen because of the chmod inside init.sh?

https://github.com/umputun/baseimage/blob/master/base.alpine/files/init.sh#L26

The Dockerfile takes care of that already:

https://github.com/umputun/remark42/blob/master/Dockerfile#L81

But I guess it's required for non-Docker deployments.

paskal commented 3 years ago

@schnerring you need to add set -x to the beginning of the docker-init.sh locally, rebuild the image and report the result of its run under user 1001: then we'll know which line is failing.

schnerring commented 3 years ago

@paskal Any way to override this via CMD without rebuilding the image? As I mentioned earlier, I'm running the container on k8s and I can't build docker images on my current machine.

paskal commented 3 years ago

You can change that file and mount the altered version to the container in the k8s instead of the version in the build.

If you don't mind my asking, why do you want to run it under non-root when the init script runs as root but after initial setup runs the app as a non-root user anyway?

schnerring commented 3 years ago

Because it's generally good practice to run containers as non-root. It went smoothly for every app except remark.

Sorry, I don't really understand how the initialization process works but why does it require to run as root?If it really is a requirement to run the container as root because of the init script, there is no point in running the container as non-root.

Would it be possible to initialize Remark inside a root init container and run remark in a non-root container? Or is the stuff that's initialized ephemeral?

paskal commented 3 years ago

Would it be possible to initialize Remark inside a root init container and run remark in a non-root container? Or is the stuff that's initialized ephemeral?

I don't know the answer, I think the only way to know for you is to try.

If it really is a requirement to run the container as root because of the init script, there is no point in running the container as non-root.

I wouldn't know either till the moment you will provide the failed run of the non-root container run with set -x set inside it. The previous error you reported was fixable, and the one you see now is invisible so I don't know if it's fixable or not.

umputun commented 3 years ago

as @paskal correctly pointed out above - remark42 process already runs under a non-root user. My comment shows ps inside the running container.

The reason why we have root init process is mostly our attempt to be user-friendly. This is the typical scenario root-init process allows us to avoid: user mounts a persistent volume to the host directory but doesn't have that directory created prior to the first run. In this cased docker will happily make this directory with the root owner and remark42 will fail miserably on the first attempt to write anything. This is the reason why we do chown here

schnerring commented 3 years ago

I don't know the answer, I think the only way to know for you is to try.

For this the init process would need to be persistent? If I understand correctly, the init process seds inside /srv which is ephemeral. So anything initialized inside the init container would be lost and unavailable to the container started after.

umputun commented 3 years ago

not sure I understand. it does chown to /srv/var which is mounted to persistent storage/volume. Like here, in the official example

The sed part is fine to be ephemeral, it runs as soon as the container started.

I still don't get why you want to get rid of root-init as in my opinion, it provides no security benefits. However, if this what you really need, the easy way is to overwrite entrypoint on your side by mounting your own init script and set it as the entrypoint. To do this you don't need to rebuild the container.

this is what I did, maybe will help to demonstrate the idea:

created my custom init.sh (with +x):

> cat init.sh

#!/bin/sh
echo "custom init"
exec $@

started the regular umputun/remark42 container with that init.sh as the entrypoint:

> docker run -it --rm -v $(pwd)/init.sh:/srv/init.sh --entrypoint=/srv/init.sh umputun/remark42 /srv/remark42 --dbg

custom init
remark42 v1.8.1-82f8f6c-20210504-18:13:45
the required flags `--secret' and `--url' were not specified

As you can see it is easy to eliminate all the init logic and replace it with your custom implementation

schnerring commented 3 years ago

I still don't get why you want to get rid of root-init as in my opinion, it provides no security benefits.

Countless articles exist that disagree with your opinion. I'm no expert and pretty much follow best practices I have read about. An example:

Even though a container uses namespaces and cgroups to limit its process(es), all it takes is one misconfiguration in its deployment settings to grant those processes access to resources on the host. If that process runs as root, it has the same access as the host root account to those resources. Additionally, if other pod or container settings are used to reduce constraints (i.e. procMount or capabilities), having a root UID compounds the risks of any exploitation of them. Unless you have a very good reason, you should never run a container as root.

An additional step would be configuring readonlyRootFilesystem, because (same article):

If your container gets compromised, and it has a read-write filesystem, an attacker is free to change its configuration, install software, and potentially launch other exploits. Having a read-only file system helps prevent these kinds of escalations by limiting the actions that an attacker can perform.

Sounds reasonable to me.

user mounts a persistent volume to the host directory but doesn't have that directly created prior to the first run

To me that sounds like a Docker host configuration issue that Remark is trying to fix. Those issues might or might not exist on other container platforms.

I wouldn't know either till the moment you will provide the failed run of the non-root container run with set -x set inside it. The previous error you reported was fixable, and the one you see now is invisible so I don't know if it's fixable or not.

So, I mounted a custom init.sh script with set -x. Switching to non-root. But now the following happens (not overriding CMD):

execute /srv/init.sh
chmod: /srv/init.sh: Read-only file system
prepare environment
+ echo 'prepare environment'
+ find . -regex '.*\.\(html\|js\|mjs\)$' -print -exec sed -i 's|{% REMARK_URL %}|https://remark42.schnerring.net|g' '{}' ';'
./web/pl.js
./web/fr.mjs
./web/bg.mjs
./web/index.html
./web/embed.mjs
./web/bp.mjs
./web/bp.js
./web/vi.js
./web/de.js
./web/de.mjs
./web/ua.js
./web/es.mjs
./web/ko.mjs
./web/privacy.html
./web/bg.js
./web/node-emoji.mjs
./web/zh.mjs
./web/ko.js
./web/fi.mjs
./web/markdown-help.html
./web/deleteme.mjs
./web/remark.js
./web/last-comments.js
./web/remark.mjs
./web/tr.mjs
./web/node-emoji.js
./web/ru.js
./web/pl.mjs
./web/counter.mjs
./web/last-comments.html
./web/deleteme.js
./web/embed.js
./web/ja.mjs
./web/zh.js
./web/counter.js
./web/ja.js
./web/es.js
./web/vi.mjs
./web/fr.js
./web/tr.js
./web/ua.mjs
./web/counter.html
./web/ru.mjs
./web/be.js
./web/iframe.html
./web/be.mjs
./web/fi.js
./web/last-comments.mjs
./web/deleteme.html
+ '[' -n  ]
+ '[' -d /srv/var ]
+ chown -R app:app /srv/var
/srv/init.sh failed

Setting CMD to /srv/init.sh:

+ echo 'prepare environment'
+ find . -regex '.*\.\(html\|js\|mjs\)$' -print -exec sed -i 's|{% REMARK_URL %}|https://remark42.schnerring.net|g' '{}' ';'
prepare environment
./web/pl.js
./web/fr.mjs
./web/bg.mjs
./web/index.html
./web/embed.mjs
./web/bp.mjs
./web/bp.js
./web/vi.js
./web/de.js
./web/de.mjs
./web/ua.js
./web/es.mjs
./web/ko.mjs
./web/privacy.html
./web/bg.js
./web/node-emoji.mjs
./web/zh.mjs
./web/ko.js
./web/fi.mjs
./web/markdown-help.html
./web/deleteme.mjs
./web/remark.js
./web/last-comments.js
./web/remark.mjs
./web/tr.mjs
./web/node-emoji.js
./web/ru.js
./web/pl.mjs
./web/counter.mjs
./web/last-comments.html
./web/deleteme.js
./web/embed.js
./web/ja.mjs
./web/zh.js
./web/counter.js
./web/ja.js
./web/es.js
./web/vi.mjs
./web/fr.js
./web/tr.js
./web/ua.mjs
./web/counter.html
./web/ru.mjs
./web/be.js
./web/iframe.html
./web/be.mjs
./web/fi.js
./web/last-comments.mjs
./web/deleteme.html
+ '[' -n  ]
+ '[' -d /srv/var ]
+ chown -R app:app /srv/var

Any ideas how to debug this further? Maybe I'm too tired and fail to debug this correctly? Is something exiting with a non-zero exit code? I also removed 2> /dev/null from chown -R app:app /srv/var 2> /dev/null but no additional output is printed.

umputun commented 3 years ago

I have no illusions about Docker's isolation, which is close to none and this is why remark42 container runs remark42 process under a non-root user. Running remark42 process this way as safe (or as dangerous) as running remark42 process under the same user without any docker, directly on the box. I don't see how provided articles even related to this case.

Generally, speaking, running remark42 on scratch base image with RO in my todo list but we don't have such a thing yet. So, you may continue experiments with a custom init or wait till we implement it someday.

To me that sounds like a Docker host configuration issue that Remark is trying to fix. Those issues might or might not exist on other container platforms.

This is correct, but practically docker's runtime is by far the most popular platform to run remark42.

Setting CMD to /srv/init.sh:

I think you are trying to do two things at once - run non-root init process and make it work with RO FS as well. In your log I see chmod: "/srv/init.sh: Read-only file system" error, which has nothing to do with non-root init but rather an indication of RO fs. Making sed replacement on the read-only files will fail for sure. I'd suggest trying just one change at a time and see how it fails or pass.

I also don't think you followed my recommendations to override the entry-point. In the second log, you just provided a custom CMD, not the entry point. The reason why I suggested making you custom init as an entry point was the full control of what it does and the ability to make it as sealed as you like. Just take a closer look at what the current init does. All you need to make remark42 work in your non-root/ro/custom environment is this:

replace {% REMARK_URL %} by content of REMARK_URL variable
replace "site_id: 'remark'" by SITE_ID
run /srv/remark42

this is it, there is no hidden magic here and none of this requires root privilege. However, saving the change (sed) as a part of init requires writable fs. I can think of several ways to prepare your custom container with ready-to-use files but all of them require building your own container.

hope it helps

schnerring commented 3 years ago

Generally, speaking, running remark42 on scratch base image with RO in my todo list but we don't have such a thing yet. So, you may continue experiments with a custom init or wait till we implement it someday.

I'm looking forward to this.

I also don't think you followed my recommendations to override the entry-point. In the second log, you just provided a custom CMD, not the entry point.

I meant to say command instead of CMD, which is the k8s equivalent of ENTRYPOINT, my mistake.

I'd suggest trying just one change at a time and see how it fails or pass.

Ultimately, I'd like to set the root FS to read-only, but I'm currently not doing that.

In your log I see chmod: "/srv/init.sh: Read-only file system" error, which has nothing to do with non-root init but rather an indication of RO fs.

No, the FS is writable. sed suceeds. I greped the files for the replacement values and they're there. It's only the custom init.sh I mount for debugging which isn't writable. For me this indicates that the /srv/remark42 server tries to chmod that init.sh file. The k8s way of mounting static files is by using config maps or secrets and mounting them into the container. Those files are inherently read-only, though. So the error is introduced by the way I mount the debugging script.

Even with chown errors, the container starts successfully when running as root:

init container
set timezone America/Chicago (Sun May 23 23:27:25 CDT 2021)
custom APP_UID not defined, using default uid=1001
chown: /srv/init.sh: Read-only file system
execute /srv/init.sh
chmod: /srv/init.sh: Read-only file system
prepare environment
+ echo 'prepare environment'
+ find . -regex '.*\.\(html\|js\|mjs\)$' -print -exec sed -i 's|{% REMARK_URL %}|https://remark42.schnerring.net|g' '{}' ';'
./web/pl.js
./web/fr.mjs
./web/bg.mjs
./web/index.html
./web/embed.mjs
./web/bp.mjs
./web/bp.js
./web/vi.js
./web/de.js
./web/de.mjs
./web/ua.js
./web/es.mjs
./web/ko.mjs
./web/privacy.html
./web/bg.js
./web/node-emoji.mjs
./web/zh.mjs
./web/ko.js
./web/fi.mjs
./web/markdown-help.html
./web/deleteme.mjs
./web/remark.js
./web/last-comments.js
./web/remark.mjs
./web/tr.mjs
./web/node-emoji.js
./web/ru.js
./web/pl.mjs
./web/counter.mjs
./web/last-comments.html
./web/deleteme.js
./web/embed.js
./web/ja.mjs
./web/zh.js
./web/counter.js
./web/ja.js
./web/es.js
./web/vi.mjs
./web/fr.js
./web/tr.js
./web/ua.mjs
./web/counter.html
./web/ru.mjs
./web/be.js
./web/iframe.html
./web/be.mjs
./web/fi.js
./web/last-comments.mjs
./web/deleteme.html
+ '[' -n  ]
+ '[' -d /srv/var ]
+ chown -R app:app /srv/var
execute "/srv/remark42 server"
remark42 master-b856b6d-20210520-18:55:18
2021/05/23 23:27:27.582 [INFO]  start server on port :8080
2021/05/23 23:27:27.582 [INFO]  root url=https://remark42.schnerring.net
2021/05/23 23:27:27.582 [INFO]  make data store, type=bolt
2021/05/23 23:27:27.582 [INFO]  bolt store for sites [{FileName:./var/schnerring.net.db SiteID:schnerring.net}], options {Timeout:30s NoGrowSync:false NoFreelistSync:false FreelistType: ReadOnly:false MmapFlags:0 InitialMmapSize:0 PageSize:0 NoSync:false OpenFile:<nil>}
2021/05/23 23:27:27.651 [INFO]  make admin store, type=shared
2021/05/23 23:27:27.651 [INFO]  make cache, type=mem
2021/05/23 23:27:27.651 [INFO]  make avatar store, type=fs
2021/05/23 23:27:27.651 [INFO]  init oauth2 service github
2021/05/23 23:27:27.652 [INFO]  make notify, types=[email]
2021/05/23 23:27:27.652 [INFO]  create notifier service, queue size=100, destinations=1
2021/05/23 23:27:27.652 [INFO]  activate auto-backup for schnerring.net under ./var/backup, duration 24h0m0s
2021/05/23 23:27:27.655 [INFO]  activate http rest server on :8080
2021/05/23 23:27:27.655 [INFO]  start pictures cleanup, staging ttl=12m30s
2021/05/23 23:27:27.655 [INFO]  run file server for ./web, path /web

I'm really sorry, I must have made a mistake earlier because when removing 2> /dev/null. Because I can now see the errors caused by chown as I initially suspected (running as non-root):

execute /srv/init.sh
chmod: /srv/init.sh: Read-only file system
+ echo 'prepare environment'
+ find . -regex '.*\.\(html\|js\|mjs\)$' -print -exec sed -i 's|{% REMARK_URL %}|https://remark42.schnerring.net|g' '{}' ';'
prepare environment
./web/pl.js
./web/fr.mjs
./web/bg.mjs
./web/index.html
./web/embed.mjs
./web/bp.mjs
./web/bp.js
./web/vi.js
./web/de.js
./web/de.mjs
./web/ua.js
./web/es.mjs
./web/ko.mjs
./web/privacy.html
./web/bg.js
./web/node-emoji.mjs
./web/zh.mjs
./web/ko.js
./web/fi.mjs
./web/markdown-help.html
./web/deleteme.mjs
./web/remark.js
./web/last-comments.js
./web/remark.mjs
./web/tr.mjs
./web/node-emoji.js
./web/ru.js
./web/pl.mjs
./web/counter.mjs
./web/last-comments.html
./web/deleteme.js
./web/embed.js
./web/ja.mjs
./web/zh.js
./web/counter.js
./web/ja.js
./web/es.js
./web/vi.mjs
./web/fr.js
./web/tr.js
./web/ua.mjs
./web/counter.html
./web/ru.mjs
./web/be.js
./web/iframe.html
./web/be.mjs
./web/fi.js
./web/last-comments.mjs
./web/deleteme.html
+ '[' -n  ]
+ '[' -d /srv/var ]
+ chown -R app:app /srv/var
chown: /srv/var/avatars/13/8991b04e9d7be01b8ef560bc616fd7e7240f68c3.image: Operation not permitted
chown: /srv/var/avatars/13: Operation not permitted
chown: /srv/var/avatars/13: Operation not permitted
chown: /srv/var/avatars: Operation not permitted
chown: /srv/var/avatars: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210513.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210514.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210515.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210516.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210517.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210518.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210519.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210520.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210521.gz: Operation not permitted
chown: /srv/var/backup/backup-schnerring.net-20210522.gz: Operation not permitted
chown: /srv/var/backup: Operation not permitted
chown: /srv/var/backup: Operation not permitted
chown: /srv/var/pictures: Operation not permitted
chown: /srv/var/pictures: Operation not permitted
chown: /srv/var/schnerring.net.db: Operation not permitted
chown: /srv/var/var/avatars: Operation not permitted
chown: /srv/var/var/avatars: Operation not permitted
chown: /srv/var/var/backup: Operation not permitted
chown: /srv/var/var/backup: Operation not permitted
chown: /srv/var/var/pictures: Operation not permitted
chown: /srv/var/var/pictures: Operation not permitted
chown: /srv/var/var/schnerring.net.db: Operation not permitted
chown: /srv/var/var: Operation not permitted
chown: /srv/var/var: Operation not permitted
chown: /srv/var: Operation not permitted
chown: /srv/var: Operation not permitted
/srv/init.sh failed

I think we're finally getting to the bottom of this. Ahhhh, I'm really sorry but it's actually an issue of the how AKS mounts Azure Files volumes via SMB. The permissions are messed up. SMB-mounted Azure Files volumes don't support chown, it's just ignored. This is something Remark can't fix from the inside of the container, this has to be taken care of from the host-side. This shows why it's probably a bad idea that Remark is concerned about Docker specifics :P

/srv/var $ ls -alh
total 36K
drwxrwxrwx    2 root     app            0 May 10 14:50 .
drwxr-xr-x    1 app      app         4.0K May 24 00:02 ..
drwxrwxrwx    2 root     app            0 May 10 14:51 avatars
drwxrwxrwx    2 root     app            0 May 10 14:51 backup
drwxrwxrwx    2 root     app            0 May 10 14:51 pictures
-rwxrwxrwx    1 root     app        32.0K May 23 23:51 schnerring.net.db
drwxrwxrwx    2 root     app            0 May 16 17:02 var

I guess to summarize, the only real issue I see with Remark is that the init.sh script's chown statement redirects errors to /dev/null.

paskal commented 3 years ago

The first error means that /srv/var which you are mounting into the container, I believe if you'll do chown 1001:1001 on the directory which is mounted as /srv/var inside the container, that command will not fail and your container will start successfully under user 1001.

UPD: nvm, I've re-read the thread and see what is missing. I think we can add proper logging to this chmod run.

paskal commented 3 years ago

@schnerring, please check the current master image and let us know if you think anything is left to improve.

schnerring commented 3 years ago

Just to clarify, it was your intention to just echo a warning and start remark after?

...
chown: /srv/var/var/schnerring.net.db: Operation not permitted
chown: /srv/var/var: Operation not permitted
chown: /srv/var/var: Operation not permitted
chown: /srv/var: Operation not permitted
chown: /srv/var: Operation not permitted
WARNING: /srv/var ownership change failed, if application will fail that might be the reason
execute "/srv/remark42 server"
remark42 master-9fa23cc-20210524-17:45:21
2021/05/24 20:11:31.828 [INFO]  start server on port :8080
2021/05/24 20:11:31.828 [INFO]  root url=https://remark42.schnerring.net
...

As you can see, because echo is the last command to run in the init script, the script exits with code 0 (because echo succeeds), and Remark is started.

paskal commented 3 years ago

That's the intention: it now clearly let you know that if the app fails, the problem is likely in the /srv/var ownership which the script was unable to set.

schnerring commented 3 years ago

Then everything works as it's supposed to. Thanks for the hard work and thanks for creating Remark ❤️

umputun / remark42

Enable running Docker container as non-root user #1007