google / fscrypt

Go tool for managing Linux filesystem encryption
Apache License 2.0
876 stars 97 forks source link

Obscure error message due to process address space limit #394

Closed maxnikulin closed 7 months ago

maxnikulin commented 7 months ago

Installing pam_fscrypt breaks authentication in Dovecot IMAP server. The error message is rather obscure, so I hope, it is possible to improve error handling.

Default address space limit (AS) for auth-worker processes is 256M and it is not enough for Go runtime (at least as the PAM module is built in Debian).

After installing libpam-fscrypt I have realized that I can not access IMAP folders any more despite login protector for this particular user is not configured:

dovecot[72165]: auth-worker: Error: fatal error: failed to reserve page summary memory
dovecot[72165]: auth-worker: Error:
dovecot[72165]: auth-worker: Error: runtime stack:
dovecot[72165]: auth-worker: Error: runtime.throw({0x7f552c418194?, 0x7f552c1feb10?})
dovecot[72165]: auth-worker: Error:         runtime/panic.go:1047 +0x5f fp=0x7f552c1feac0 sp=0x7f552c1fea90 pc=0x7f552c28a53f
dovecot[72165]: auth-worker: Error: runtime.(*pageAlloc).sysInit(0x7f552c5f6fd0)
dovecot[72165]: auth-worker: Error: runtime/mpagealloc_64bit.go:82 +0x195 fp=0x7f552c1feb48 sp=0x7f552c1feac0 pc=0x7f552c280ef5
dovecot[72165]: auth-worker: Error: runtime.(*pageAlloc).init(0x7f552c5f6fd0, 0x7f552c5f6fc0, 0x0?)
dovecot[72165]: auth-worker: Error:         runtime/mpagealloc.go:324 +0x70 fp=0x7f552c1feb70 sp=0x7f552c1feb48 pc=0x7f552c27eb50
dovecot[72165]: auth-worker: Error: runtime.(*mheap).init(0x7f552c5f6fc0)
dovecot[72165]: auth-worker: Error:         runtime/mheap.go:729 +0x13f fp=0x7f552c1feba8 sp=0x7f552c1feb70 pc=0x7f552c27bf5f
dovecot[72165]: auth-worker: Error: runtime.mallocinit()
dovecot[72165]: auth-worker: Error:         runtime/malloc.go:407 +0xb2 fp=0x7f552c1febd0 sp=0x7f552c1feba8 pc=0x7f552c260e72
dovecot[72165]: auth-worker: Error: runtime.schedinit()
dovecot[72165]: auth-worker: Error:         runtime/proc.go:693 +0xab fp=0x7f552c1fec30 sp=0x7f552c1febd0 pc=0x7f552c28df0b
dovecot[72165]: auth-worker: Error: runtime.rt0_go()
dovecot[72165]: auth-worker: Error:         runtime/asm_amd64.s:345 +0x120 fp=0x7f552c1fec38 sp=0x7f552c1fec30 pc=0x7f552c2b7c20
dovecot[72165]: auth: Error: auth-worker: Aborted PASSV request for mailuser: Worker process died unexpectedly
dovecot[72165]: auth-worker: Fatal: master: service(auth-worker): child 72211 returned error 2

Actually it is even worse since such failure happens for invalid users as well

curl -v 'imap://bad:bad@localhost/'

It seems it happens rather early during initialization of Go runtime. I suspected that it might be some issue with C vs Go runtime, but strace of a dovecot auth process confirmed that it is namely memory allocation problem

4460  14:27:49 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f3d547ff990, parent_tid=0x7f3d547ff990, exit_signal=0, stack=0x7f3d53fff000, stack_size=0x7ffd00, tls=0x7f3d547ff6c0} <unfinished ...>
4460  14:27:49 <... clone3 resumed> => {parent_tid=[4468]}, 88) = 4468

4468  14:27:49 mmap(NULL, 536870912, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
4468  14:27:49 <... mmap resumed>)      = -1 ENOMEM (Cannot allocate memory)

Actual limit for this kind of processes:

prlimit --pid 4608
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                268435456 268435456 bytes

Playing with this limit I have seen various error messages, e.g.

auth[4822]: pam_unix(dovecot:auth): check pass; user unknown
auth[4822]: pam_unix(dovecot:auth): authentication failure; logname= uid=0 euid=0 tty=dovecot ruser=bad rhost=127.0.0.1
dovecot[4815]: auth-worker: Error: runtime/cgo: pthread_create failed: Resource temporarily unavailable
dovecot[4815]: auth: Error: auth-worker: Aborted PASSV request for bad: Worker process died unexpectedly
dovecot[4815]: auth-worker: Fatal: master: service(auth-worker): child 4822 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps)

Just for completeness, a Dovecot configuration snippet that fixes the issue

service auth-worker {
  vsz_limit = 2G
}

while by default it is default_vsz_limit = 256M.

I am realizing that fscrypt is hardly compatible with a mail server. I am not going to use login protectors and mail boxes for same users. I find it unfortunate that packages are incompatible out of the box.

Debian-12 bookworm, Linux kernel 6.1.55-1, libpam-fscrypt 0.3.3-1+b6. It is not the latest fscrypt version, but it should include error handlers for PAM methods. I am unsure if the issue may be caused by build flags specific to Debian.

From my point of view, 256M limit should be enough to avoid errors when a PAM module is not supposed to do anything useful. That is why I am requesting documentation update and, if possible, improving of error handling.

ebiggers commented 7 months ago

I'm not sure that anything can be done about this while pam_fscrypt is written in Go, given that the address space is being allocated by the Go runtime when it starts up, and this is working as intended for any Go program (see https://github.com/golang/go/issues/38010). Keep in mind that it's address space, not memory, and thus your first two requests ("It would be nice to postpone allocation of significant amount of memory" and "not allocate almost ~1G of RAM if a user does not have login protector") describe current behavior already. Allocating address space does cause problems when programs use RLIMIT_AS to limit the address space, but that feature is largely obsolete now that memory cgroups exist and allow actually limiting memory. It might be worth reaching out to the application you're using that uses RLIMIT_AS and seeing if it's what they really want to be using; most likely they really intended to limit memory, not address space.

maxnikulin commented 7 months ago

Thanks for pointing me to the Go issue. I admit I was not precise using memory and address space interchangeably. It is sour that Go has no knobs like build or link flags to adjust behavior of memory allocator. This case I would trade some performance degradation for running with stricter limits. I see, it is out of control of fscrypt developers and package maintainers in various Linux distributions.

It reminds me some FORTRAN 77 libraries with requirement to explicitly prepare large enough arena for further "dynamic" memory allocations (COMMON/PAWC/... and HLIMIT in http://labmaster.mi.infn.it/wwwasdoc.web.cern.ch/wwwasdoc/hbook_html3/node12.html).

Dovecot is a complex enough mail server. It uses a squad of processes to minimize privileges of each one. Likely it will be non-trivial to properly organize the group of daemons into a suitable cgroup subtree. Anyway I posted description of the issue with hope that it might affect later decision of developers: Authentication failure due to address space limit. Wed, 6 Dec 2023 18:06:05.

I find a similar post with another PAM module dovecot: lmtp: Error: fatal error: failed to reserve page summary memory. Thu, Sep 17 2020 12:20:12. That case the solution was to use Rust instead of Go.

ebiggers commented 7 months ago

At this point I'd prefer Rust over Go too. Of course, the implementation language of fscrypt was chosen back in 2016 when Rust wasn't as well established, so we're stuck with it unless someone wants to rewrite it which would be a large effort.

maxnikulin commented 7 months ago

At this point I'd prefer Rust over Go too.

PAM module from fscrypt is among top search results: https://pkg.go.dev/search?q=pam A warning concerning RLIMIT_AS in the library docs might help other developers to decide if Go should be used for their projects.

maxnikulin commented 7 months ago

Originally I suggested vsz_limit = 1G for auth-worker, but it still sometimes causes silent failures making Thunderbird crazy. I hope 2G would be enough (I updated the initial comment to not confuse readers).

It happens even when /etc/pam.d/dovecot does not load pam_fscrypt.so (through common-auth or common-session). The module is loaded from /etc/pam.d/common-account. Adding debug there does not cause any message in both cases of successful or failed authentication. Dovecot just logs

pam_authenticate() failed: Authentication failure (Password mismatch?)

To reproduce force new auth-worker on every request (or restart dovecot after each request)

service auth-worker {
  service_count = 1
}

and

while curl -v imap://test:test@127.0.0.1/ ; do sleep 0.1 ; done

Thunderbird creates multiple connections making failures rather frequent.