Improve error message for "Unable to locate pem file"

ratijas commented 3 years ago

Summary

After splitting config with single pem into separate key/dhparam/certificate files (or rather, leaving them as is provided by letsencrypt/certbot), mounting them on a docker volume (local bind to /etc/letsencrypt), and upgrading to 1.8.2 (presumably from 1.7.x — something taggedlatest` from 2 years earlier)... official Docker image of znc failed to start.

version: '3.7'
volumes:
  znc:
    external: true
services:
  znc:
    image: znc:1.8
    volumes:
      - znc:/znc-data
      - "/etc/letsencrypt:/etc/letsencrypt:ro"
    ports:
      - '6697:6697'
    restart: always

$ docker-compose up
Creating network "znc_default" with the default driver
Creating znc_znc_1 ... done
Attaching to znc_znc_1
znc_1  | Building modules ./push.cpp...
...
... # building modules
...
znc_1  | Checking for list of available modules... 
znc_1  | Opening config [/znc-data/configs/znc.conf]... 
znc_1  | Found old config from ZNC 1.7.2. Saving a backup of it.
znc_1  | Creating a config backup... [ File exists ]
znc_1  | Loading global module [webadmin]... 
znc_1  | Loading global module [lastseen]... 
znc_1  | Binding to port [+6697]... [ Unable to locate pem file: /etc/letsencrypt/live/ratijas.tk/fullchain.pem ]
znc_1  | Unrecoverable config error.

And when running manually with equivalent set of options, znc presents interactive dialog:

$ docker run --rm -it -p 6697:6697 -v znc:/znc-data -v /etc/letsencrypt:/etc/letsencrypt:ro znc:1.8.2
Building modules ./push.cpp...
...
... # building modules
...
[ .. ] Checking for list of available modules...
[ .. ] Opening config [/znc-data/configs/znc.conf]...
[ ** ] Found old config from ZNC 1.7.2. Saving a backup of it.
[ .. ] Creating a config backup...
[ !! ] File exists
[ .. ] Loading global module [webadmin]...
[ .. ] Loading global module [lastseen]...
[ .. ] Binding to port [+6697]...
[ !! ] Unable to locate pem file: /etc/letsencrypt/live/ratijas.tk/fullchain.pem
[ ?? ] Would you like to create a new pem file? (yes/no) [yes]: no
[ ** ] Unrecoverable config error.

However, the pem file sure exists. It can be opened and read, as demonstrated below.

$ docker run -p 6697:6697 -v znc:/znc-data -v /etc/letsencrypt:/etc/letsencrypt:ro --rm -it --entrypoint=/bin/sh znc:1.8.2
/ # ls -lF /etc/letsencrypt/live/ratijas.tk/fullchain.pem
lrwxrwxrwx    1 root     root            40 Nov 29 07:12 /etc/letsencrypt/live/ratijas.tk/fullchain.pem -> ../../archive/ratijas.tk/fullchain11.pem
/ # head -n1 /etc/letsencrypt/live/ratijas.tk/fullchain.pem
-----BEGIN CERTIFICATE-----
/ #

Version

Docker:

docker version
Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:40:58 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

ZNC: znc:1.8.2

[PART 1]

The first thing is, there are different permissions for different types of files under /etc/letsencrypt/archive/domain/. The private key is the only one with most restrictive mode ower: rw, others: nothing.

$ # Inside znc:1.8.2 docker container
$ cd /etc/letsencrypt/archive/ratijas.tk
$ ls -lh *11*
-rw-r--r-- 1 root root 1.9K Nov 29 07:12 cert11.pem
-rw-r--r-- 1 root root 1.7K Nov 29 07:12 chain11.pem
-rw-r--r-- 1 root root 3.5K Nov 29 07:12 fullchain11.pem
-rw------- 1 root root 1.7K Nov 29 07:12 privkey11.pem
$ su-exec znc:znc head -n1 cert11.pem 
-----BEGIN CERTIFICATE-----
$ su-exec znc:znc head -n1 privkey11.pem
head: privkey11.pem: Permission denied

Now, I wonder about two things:

How nginx works?

They have quite similar-looking RUN script in Dockerfile, which also adds non-root user & group, namely nginx:nginx. But it's an option, meaning it's there for our convenience, but not enabled by default.

Nginx container also fails, when switched to nginx:nginx user.

me@server$ docker run --rm -it \
-v /etc/letsencrypt:/etc/letsencrypt:ro \
--workdir /etc/letsencrypt/archive/ratijas.tk \
--entrypoint=/bin/bash nginx:latest
root@docker$ head -n1 privkey11.pem 
-----BEGIN PRIVATE KEY-----
root@docker$ su -s /bin/bash -g nginx nginx
nginx@docker$ id
uid=101(nginx) gid=101(nginx) groups=101(nginx)
nginx@docker$ head -n1 privkey11.pem
head: cannot open 'privkey11.pem' for reading: Permission denied

What would be the downsides of running ZNC as root in docker?
- Certificates can be mounted read-only anyway, so that's probably not a concern.
- Current strict default configuration basically renders ZNC incompatible with arguably most popular certificate authority out of the box.

[PART 2]

Here lies a problem. The code is so busy checking for file's mere "existence" that it forgot what it was supposed do.

Checking for file status before making an attempt to open it leads to inherent race conditions and poor control flow with redundant error checking. There are plenty of articles on this topic (just google for TOCTOU), I don't have to repeat myself, but the short of it is this:

DO NOT check for file's existence, attributes etc.
Just open it, for whatever mode intended.
Check for any error from the previous step (which you MUST do anyway).
Handle those errors.

psychon commented 3 years ago

Checking for file status before making an attempt to open it leads to inherent race conditions and poor control flow with redundant error checking.

Well, we can just drop the check for that file's existence. This replaces the error from "ZNC refuses to start since it cannot access the SSL certificate" with "ZNC closes all incoming SSL connections without answering the handshake and no clear error message anywhere". The socket library that ZNC uses loads the certificate for each incoming connection. (Well, if you run ZNC in --debug mode, one of these messages become visible: https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L1858-L1884) (Oh and while we are talking about "do not use access": Feel free to open an issue on Csocket: https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L1842-L1854)

Also, this is not a security check. We are not checking the file's owner or doing magic. This is just a convenience feature / quality of life thing. If you read the wikipedia article for TOCTOU, it all sounds quite security-centric.

ratijas commented 3 years ago

This replaces the error from "ZNC refuses to start since it cannot access the SSL certificate" with "ZNC closes all incoming SSL connections without answering the handshake and no clear error message anywhere".

What are you talking about? In this case instead of reporting the real underlying problem of opening a file, it prints custom misleading error message about file purportedly being not found.

This check does nothing right. If the point is to verify that https/SSL is set up correctly — how about doing self-test GET request? It would prove much more than simply file being found or not.

If you read the wikipedia article for TOCTOU, it all sounds quite security-centric.

"Sounds"? I am not officially qualified to make comments on security either. Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Let's see. When the code takes on "Unable to locate pem file" branch, it then may attempt to prompt user with message "Would you like to create a new pem file?". If the answer is "yes", program would attempt to write to that same pem file as returned by the same CZNC::GetPemLocation() function. Let's say, what would happen if at that exact moment certbot on the host triggered by a periodic systemd timer updates certificates and re-creates links? I don't know how certbot works — it may replace soft links atomically, it may not. What it ZNC creates it's PEM file at exact moment between deletion and creation? I totally don't know. Sounds like an undefined behavior to me.

DarthGandalf commented 3 years ago

What would be the downsides of running ZNC as root in docker?

That's a security risk? It's easier to escape the container as root than as non-root.

it prints custom misleading error message about file purportedly being not found.

I agree, the message could be improved to tell what exactly went wrong.

This check does nothing right.

PRs with fixes are welcome.

how about doing self-test GET request?

That could work. I'm not sure it worth though. Does nginx do anything like that?

Again, we could just remove the check. But currently ZNC reads the key upon accepting a connection, and the current error message is better than failing to start handshake, because that's harder to debug. The proper fix would be to read the file on startup, and provide a way for certbot to trigger re-reading it.

Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Correct.

If the answer is "yes", program would attempt to write to that same pem file as returned by the same CZNC::GetPemLocation() function.

If that file doesn't... khem... exist due to its location being inaccessible, then ZNC won't be able to write to it either.

ratijas commented 3 years ago

That could work. I'm not sure it worth though. Does nginx do anything like that?

Nginx reads certificate & keys once. And only reloads them on configuration change (signal nginx -s reload). It doesn't just check for file existence, but actually loads the keys. So if that went smoothly, I'm sure it will be alright from that point onward.

Again, we could just remove the check. But currently ZNC reads the key upon accepting a connection, and the current error message is better than failing to start handshake, because that's harder to debug. The proper fix would be to read the file on startup, and provide a way for certbot to trigger re-reading it.

Yeah, that's kinda wasteful. Definitely something to improve on.

Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Correct.

How may I verify this statement for correctness?

If that file doesn't... khem... exist due to its location being inaccessible, then ZNC won't be able to write to it either.

Think carefully, think twice.

Assuming file is not readable by ZNC, but the directory itself is writable, so can create files.
ZNC checks file via symbolic link — not accessible
Certbot initiates renewal
ZNC prompts to create PEM file — because it "thinks" file does not exist
Certbot removes old link
Just before certbot installs new link...
ZNC creates dummy PEM where the link was just a moment ago.
< -- Undefined state with race condition starts here

DarthGandalf commented 3 years ago

Think carefully, think twice.

Could you be less aggressive, please?

Step 3.5 in your scenario: user clicks Yes on that prompt. Does the user not know that he configured certbot before? If user doesn't know what he's doing, we can't protect from that. User has access to run ZNC and/or to overwrite any files he pleases.

Anyway, this is dupe of #1215 - when ZNC gains ability to read the cert once, on startup, it won't need such separate check on startup.

ratijas commented 3 years ago

Yeah, sorry. I'm just feeling terribly annoyed, because the first step toward a solution is acknowledging the problem at hands, which is like... not happening here.

Does the user not know that he configured certbot before?

First, I don't ever assume that users knows what they are doing. Second, they might know, and might not; they might even not be administrators (znc is designed to be run as non-root anyway). I, for example, share my server with friends to run some web stuff like znc, and I am the one doing DNS and certificates management — I certainly wouldn't wanna let them mess it up even by pure accident, i.e. without malicious intentions.

Anyway, this is dupe of #1215 - when ZNC gains ability to read the cert once, on startup, it won't need such separate check on startup.

Not much of a duplicate, as I've raised totally different concerns here. Although a good solution might resolve both issues at once. Thanks for pointing out, I'll take a look at what can be done there as well.

Closing this means that (1) an error message became adequate, (2) race conditions are eliminated. Possibly, (3) an option to run ZNC as root or otherwise integrate nicely with letsencrypt/certbot out-of-the-box. For option (3) updating some documentation on ZNC wiki wouldn't hurt.

Thus, please, consider reopening.

mrusme commented 3 years ago

This issue seems to only describe half of what could be the problem. Under OpenBSD 6.8 I was experiencing the following scenario:

I placed the cert into the .znc directory and added it to the config:

SSLCertFile = ~/.znc/cert1.pem

znc kept nagging that it wasn't able to locate the pem file, even though the same user it was running under could cat the file. At some point I didn't know what to do and simply pressed y, to let znc create the pem. This was what it resulted in:

Turns out that in my config I had a whitespace at the end of the certificate name. I kid you not. My config was SSLCertFile = ~/.znc/cert1.pem instead of SSLCertFile = ~/.znc/cert1.pem. Hence znc wasn't unable to find the pem and so it created a file that had a whitespace at the end of its name.

znc / znc