haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.07k stars 661 forks source link

Inconsistent user/group ownership when writing files in hooks #3262

Closed DoobleD closed 6 months ago

DoobleD commented 10 months ago

In my Haraka setup, I'm using multiple hooks in a custom plugin to log stuff to files on disk (hook_reset_transaction, hook_disconnect, hook_log, ...). I use node's fs.promises functions to write the log files.

I recently upgraded my node version from 16 to 20, and I noticed that these log files are now often written with root:root as owner, instead of the user and group that I provided in my smtp.ini. That later creates some permission issues in my app.

This is perhaps not so suprising, as since v18 node no longer tries to manage file ownerhsip when a script is run as root. And I do run Haraka as root, as per the docs.

What's more surprising, is that the ownership seems to vary. The vast majority of the times, it is root:root, but sometimes a file is written with the user and group of my smtp.ini as owner. I can't seem to find any pattern though, it doesn't seem to be a specific hook that behaves differently for instance. It looks random.

When displaying process.getuid() and process.getgid() in hooks, these are always equal to the smtp.ini user and group, no matter if the file written just after checking the uid/gid is written as root:root or as the intended user/group.

Another thing to note is that when using sync functions (e.g. fs.writeFileSync), the file is always wrttien as the user/group from my smtp.ini. My best guess is that it's because fs.promises functions use node's underlying threadpool to perform fs operations, and these threads probably haven't dropped privileges like the main event loop thread did. Whereas sync functions run (and block) the event loop thread, which has dropped privileges.

I suppose from time to time fs.promise functions execute in the main event loop depending on what thread is idle. That would explain the randomness.

Any idea how to fix this? I'm looking for a way to make the ownership consistent, either alway root or always what's in my smtp.ini.

analogic commented 10 months ago

I ran into this bug recently - it seems that process.setgid(gid) and process.setuid(uid) are not working properly in v20. I am also unable to understand how it works inside the system, and how it is even possible for a process with downgraded privileges to still work as root.

I've reported this bug to Nodejs (https://github.com/nodejs/node/issues/51148), but as you can see... well... you can't really see. I think it is pretty serious security issue.

You can try it for your self, just use setuid and create dir/file and see which rights are there.

sh@zumpa:~$ sudo docker run -v ~/:/mnt node:16 /mnt/test.js /tmp/test Dropped privileges to UID: 1, GID: 1 Directory '/tmp/test' created successfully. Directory owner UID: 1, GID: 1

sh@zumpa:~$ sudo docker run -v ~/:/mnt node:18 /mnt/test.js /tmp/test Dropped privileges to UID: 1, GID: 1 Directory '/tmp/test' created successfully. Directory owner UID: 1, GID: 1

sh@zumpa:~$ sudo docker run -v ~/:/mnt node:20 /mnt/test.js /tmp/test Dropped privileges to UID: 1, GID: 1 Directory '/tmp/test' created successfully. Directory owner UID: 0, GID: 0

sh@zumpa:~$ sudo docker run -v ~/:/mnt node:21 /mnt/test.js /tmp/test Dropped privileges to UID: 1, GID: 1 Directory '/tmp/test' created successfully. Directory owner UID: 0, GID: 0

DoobleD commented 10 months ago

Thanks for the feedback @analogic.

I was able to reproduce it with a simple node script too. I think the node team most likely knows about this behavior, since they purposefuelly removed ownership management for root scripts in node 18.14:

Explanation: when run as root previous versions of npm attempted to manage file ownership automatically on the user's behalf. this behavior was problematic in many cases and has been removed in favor of allowing users to manage their own filesystem permissions

how it is even possible for a process with downgraded privileges to still work as root.

I'm not 100% sure but it looks like that's because fs.promises.* functions are executed in node's underlying thread pool most of the time, i.e. not in the event loop thread. And the setuid/setgid functions probably only apply to the event loop thread, or threads created after calls to setuid/setgid. Node's thread pool must be created well before the Haraka code dropping privileges is executed.

The fs/promises API provides asynchronous file system methods that return promises.

The promise APIs use the underlying Node.js threadpool to perform file system operations off the event loop thread.

Another thing that lead me to this is that when using sync fs functions - which do execute in the event loop thread - the issue doesn't happen.

For now the only workaround I got is to have the hooks code write into temporary files, and have a separate root script running as daemon that fixes theses temp files ownership and then moves them to the final place. Or if that's acceptable in your case you could use the sync functions (e.g. fs.writeFileSync).

DoobleD commented 10 months ago

One way to resolve the issue would be to not run Haraka as root. But I suspect there are very good reasons why the docs instructs to run it as root and have it drop privileges afterwards instead.

analogic commented 10 months ago

One way to resolve the issue would be to not run Haraka as root. But I suspect there are very good reasons why the docs instructs to run it as root and have it drop privileges afterwards instead.

Sure, ports below 1024 were historically meant to be opened only by root.

analogic commented 10 months ago

I'm not 100% sure but it looks like that's because fs.promises.* functions are executed in node's underlying thread pool most of the time, i.e. not in the event loop thread. And the setuid/setgid functions probably only apply to the event loop thread, or threads created after calls to setuid/setgid. Node's thread pool must be created well before the Haraka code dropping privileges is executed.

Yes, it could be. My view may be simplistic, but if "ps" lists the process as user-owned, I would expect that if nodejs tries to do root stuff from a process with dropped privileges, the kernel would forbid this behaviour.

I originally thought that all theads inherit the same rights from the process, but... https://stackoverflow.com/a/45398180

And if I go deeper and test it:

# NodeJS v20
sh@zumpa:~$ ps -T -p 306187 -f
UID          PID    SPID    PPID  C STIME TTY          TIME CMD
daemon    306187  306187  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
root      306187  306210  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306211  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
root      306187  306212  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306213  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306214  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306215  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306216  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
root      306187  306217  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306187  306218  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test
root      306187  306219  306168  0 20:55 ?        00:00:00 node /mnt/test.js /tmp/test

...and...

# NodeJS v18
sh@zumpa:~$ ps -T -p 306629 -f
UID          PID    SPID    PPID  C STIME TTY          TIME CMD
daemon    306629  306629  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306650  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306651  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306652  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306653  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306654  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306655  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306656  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306657  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306658  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test
daemon    306629  306659  306606  0 21:05 ?        00:00:00 node /mnt/test.js /tmp/test

The only thing I am surprised about is that it was not discovered and fixed earlier.

DoobleD commented 10 months ago

Interesting. My previous guess may be wrong: the setuid description linked in the stackoverflow post says that a uid change should apply to all threads according to POSIX.

I'm unsure what's going on. But yeah I would have expected the same behavior as you.

DoobleD commented 10 months ago

Node is apparently not fully POSIX compliant. Not sure if node's setuid/gid are.

msimerson commented 10 months ago

As was pointed out earlier, running as a non-root user would create issues on any semi-normal *nix system that refuses to permit non-root users from binding to ports less than 1024.

Where we have code that is writing files to disk, and we care about the permissions, a potential solution is to check the ownership of the file immediately after create or append, maybe while we still have the filehandle, and use fs.chown if it doesn't match smtp.ini settings.

DoobleD commented 10 months ago

Thanks @msimerson. Calls to chown fail with an EPERM error. For some reasons these appear to be executed as the smtp.ini user rather than root, but since the files are written as root...

analogic commented 10 months ago

I'm not getting any notifications or replies to my "hidden" bug report, so I've personally downgraded to v18. I still hope someone at Nodejs is working on this, as it makes process.setuid essentially useless and potentially dangerous.

analogic commented 6 months ago

This issue seems to be finally resolved. See:

https://github.com/nodejs/nodejs.org/blob/main/pages/en/blog/release/v20.11.1.md