znc / znc

Official repository for the ZNC IRC bouncer
https://znc.in/
Apache License 2.0
2k stars 378 forks source link

Improve error message for "Unable to locate pem file" #1769

Open ratijas opened 3 years ago

ratijas commented 3 years ago

Summary

After splitting config with single pem into separate key/dhparam/certificate files (or rather, leaving them as is provided by letsencrypt/certbot), mounting them on a docker volume (local bind to /etc/letsencrypt), and upgrading to 1.8.2 (presumably from 1.7.x — something taggedlatest` from 2 years earlier)... official Docker image of znc failed to start.

version: '3.7'
volumes:
  znc:
    external: true
services:
  znc:
    image: znc:1.8
    volumes:
      - znc:/znc-data
      - "/etc/letsencrypt:/etc/letsencrypt:ro"
    ports:
      - '6697:6697'
    restart: always
$ docker-compose up
Creating network "znc_default" with the default driver
Creating znc_znc_1 ... done
Attaching to znc_znc_1
znc_1  | Building modules ./push.cpp...
...
... # building modules
...
znc_1  | Checking for list of available modules... 
znc_1  | Opening config [/znc-data/configs/znc.conf]... 
znc_1  | Found old config from ZNC 1.7.2. Saving a backup of it.
znc_1  | Creating a config backup... [ File exists ]
znc_1  | Loading global module [webadmin]... 
znc_1  | Loading global module [lastseen]... 
znc_1  | Binding to port [+6697]... [ Unable to locate pem file: /etc/letsencrypt/live/ratijas.tk/fullchain.pem ]
znc_1  | Unrecoverable config error.

And when running manually with equivalent set of options, znc presents interactive dialog:

$ docker run --rm -it -p 6697:6697 -v znc:/znc-data -v /etc/letsencrypt:/etc/letsencrypt:ro znc:1.8.2
Building modules ./push.cpp...
...
... # building modules
...
[ .. ] Checking for list of available modules...
[ .. ] Opening config [/znc-data/configs/znc.conf]...
[ ** ] Found old config from ZNC 1.7.2. Saving a backup of it.
[ .. ] Creating a config backup...
[ !! ] File exists
[ .. ] Loading global module [webadmin]...
[ .. ] Loading global module [lastseen]...
[ .. ] Binding to port [+6697]...
[ !! ] Unable to locate pem file: /etc/letsencrypt/live/ratijas.tk/fullchain.pem
[ ?? ] Would you like to create a new pem file? (yes/no) [yes]: no
[ ** ] Unrecoverable config error.

However, the pem file sure exists. It can be opened and read, as demonstrated below.

$ docker run -p 6697:6697 -v znc:/znc-data -v /etc/letsencrypt:/etc/letsencrypt:ro --rm -it --entrypoint=/bin/sh znc:1.8.2
/ # ls -lF /etc/letsencrypt/live/ratijas.tk/fullchain.pem
lrwxrwxrwx    1 root     root            40 Nov 29 07:12 /etc/letsencrypt/live/ratijas.tk/fullchain.pem -> ../../archive/ratijas.tk/fullchain11.pem
/ # head -n1 /etc/letsencrypt/live/ratijas.tk/fullchain.pem
-----BEGIN CERTIFICATE-----
/ # 

Version

Docker:

docker version
Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:40:58 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 05:59:55 2019
  OS/Arch:          linux/amd64
  Experimental:     false

ZNC: znc:1.8.2

Related

Issue #1267 looks exactly like mine, but it was closed for reasons either not explained, or the ones I gon't understand.

DarthGandalf commented 3 years ago

Because head works for you (thanks for that test), this is different from #1267.

By providing --entrypoint, you're skipping the step of changing user from root to znc. Perhaps znc user doesn't have enough permissions to read one of directories in path to /etc/letsencrypt/live/ratijas.tk/fullchain.pem?

ratijas commented 3 years ago

You are goddamn right! (And I should have had some sleep before posting issues)

From now on, this issue will be split in two.


[PART 1]

The first thing is, there are different permissions for different types of files under /etc/letsencrypt/archive/domain/. The private key is the only one with most restrictive mode ower: rw, others: nothing.

$ # Inside znc:1.8.2 docker container
$ cd /etc/letsencrypt/archive/ratijas.tk
$ ls -lh *11*
-rw-r--r-- 1 root root 1.9K Nov 29 07:12 cert11.pem
-rw-r--r-- 1 root root 1.7K Nov 29 07:12 chain11.pem
-rw-r--r-- 1 root root 3.5K Nov 29 07:12 fullchain11.pem
-rw------- 1 root root 1.7K Nov 29 07:12 privkey11.pem
$ su-exec znc:znc head -n1 cert11.pem 
-----BEGIN CERTIFICATE-----
$ su-exec znc:znc head -n1 privkey11.pem
head: privkey11.pem: Permission denied

Now, I wonder about two things:


[PART 2]

Here lies a problem. The code is so busy checking for file's mere "existence" that it forgot what it was supposed do.

Checking for file status before making an attempt to open it leads to inherent race conditions and poor control flow with redundant error checking. There are plenty of articles on this topic (just google for TOCTOU), I don't have to repeat myself, but the short of it is this:

  1. DO NOT check for file's existence, attributes etc.
  2. Just open it, for whatever mode intended.
  3. Check for any error from the previous step (which you MUST do anyway).
  4. Handle those errors.
psychon commented 3 years ago

Checking for file status before making an attempt to open it leads to inherent race conditions and poor control flow with redundant error checking.

Well, we can just drop the check for that file's existence. This replaces the error from "ZNC refuses to start since it cannot access the SSL certificate" with "ZNC closes all incoming SSL connections without answering the handshake and no clear error message anywhere". The socket library that ZNC uses loads the certificate for each incoming connection. (Well, if you run ZNC in --debug mode, one of these messages become visible: https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L1858-L1884) (Oh and while we are talking about "do not use access": Feel free to open an issue on Csocket: https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L1842-L1854)

Also, this is not a security check. We are not checking the file's owner or doing magic. This is just a convenience feature / quality of life thing. If you read the wikipedia article for TOCTOU, it all sounds quite security-centric.

ratijas commented 3 years ago

This replaces the error from "ZNC refuses to start since it cannot access the SSL certificate" with "ZNC closes all incoming SSL connections without answering the handshake and no clear error message anywhere".

What are you talking about? In this case instead of reporting the real underlying problem of opening a file, it prints custom misleading error message about file purportedly being not found.

This check does nothing right. If the point is to verify that https/SSL is set up correctly — how about doing self-test GET request? It would prove much more than simply file being found or not.

If you read the wikipedia article for TOCTOU, it all sounds quite security-centric.

"Sounds"? I am not officially qualified to make comments on security either. Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Let's see. When the code takes on "Unable to locate pem file" branch, it then may attempt to prompt user with message "Would you like to create a new pem file?". If the answer is "yes", program would attempt to write to that same pem file as returned by the same CZNC::GetPemLocation() function. Let's say, what would happen if at that exact moment certbot on the host triggered by a periodic systemd timer updates certificates and re-creates links? I don't know how certbot works — it may replace soft links atomically, it may not. What it ZNC creates it's PEM file at exact moment between deletion and creation? I totally don't know. Sounds like an undefined behavior to me.

DarthGandalf commented 3 years ago

What would be the downsides of running ZNC as root in docker?

That's a security risk? It's easier to escape the container as root than as non-root.

it prints custom misleading error message about file purportedly being not found.

I agree, the message could be improved to tell what exactly went wrong.

This check does nothing right.

PRs with fixes are welcome.

how about doing self-test GET request?

That could work. I'm not sure it worth though. Does nginx do anything like that?

Again, we could just remove the check. But currently ZNC reads the key upon accepting a connection, and the current error message is better than failing to start handshake, because that's harder to debug. The proper fix would be to read the file on startup, and provide a way for certbot to trigger re-reading it.

Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Correct.

If the answer is "yes", program would attempt to write to that same pem file as returned by the same CZNC::GetPemLocation() function.

If that file doesn't... khem... exist due to its location being inaccessible, then ZNC won't be able to write to it either.

ratijas commented 3 years ago

That could work. I'm not sure it worth though. Does nginx do anything like that?

Nginx reads certificate & keys once. And only reloads them on configuration change (signal nginx -s reload). It doesn't just check for file existence, but actually loads the keys. So if that went smoothly, I'm sure it will be alright from that point onward.

Again, we could just remove the check. But currently ZNC reads the key upon accepting a connection, and the current error message is better than failing to start handshake, because that's harder to debug. The proper fix would be to read the file on startup, and provide a way for certbot to trigger re-reading it.

Yeah, that's kinda wasteful. Definitely something to improve on.

Are you 100% sure it can not be misused (apart from already masking real errors, that is)?

Correct.

How may I verify this statement for correctness?

If that file doesn't... khem... exist due to its location being inaccessible, then ZNC won't be able to write to it either.

Think carefully, think twice.

  1. Assuming file is not readable by ZNC, but the directory itself is writable, so can create files.
  2. ZNC checks file via symbolic link — not accessible
  3. Certbot initiates renewal
  4. ZNC prompts to create PEM file — because it "thinks" file does not exist
  5. Certbot removes old link
  6. Just before certbot installs new link...
  7. ZNC creates dummy PEM where the link was just a moment ago.
  8. < -- Undefined state with race condition starts here
DarthGandalf commented 3 years ago

Think carefully, think twice.

Could you be less aggressive, please?

Step 3.5 in your scenario: user clicks Yes on that prompt. Does the user not know that he configured certbot before? If user doesn't know what he's doing, we can't protect from that. User has access to run ZNC and/or to overwrite any files he pleases.

Anyway, this is dupe of #1215 - when ZNC gains ability to read the cert once, on startup, it won't need such separate check on startup.

ratijas commented 3 years ago

Yeah, sorry. I'm just feeling terribly annoyed, because the first step toward a solution is acknowledging the problem at hands, which is like... not happening here.

Does the user not know that he configured certbot before?

First, I don't ever assume that users knows what they are doing. Second, they might know, and might not; they might even not be administrators (znc is designed to be run as non-root anyway). I, for example, share my server with friends to run some web stuff like znc, and I am the one doing DNS and certificates management — I certainly wouldn't wanna let them mess it up even by pure accident, i.e. without malicious intentions.

Anyway, this is dupe of #1215 - when ZNC gains ability to read the cert once, on startup, it won't need such separate check on startup.

Not much of a duplicate, as I've raised totally different concerns here. Although a good solution might resolve both issues at once. Thanks for pointing out, I'll take a look at what can be done there as well.

Closing this means that (1) an error message became adequate, (2) race conditions are eliminated. Possibly, (3) an option to run ZNC as root or otherwise integrate nicely with letsencrypt/certbot out-of-the-box. For option (3) updating some documentation on ZNC wiki wouldn't hurt.

Thus, please, consider reopening.

mrusme commented 3 years ago

This issue seems to only describe half of what could be the problem. Under OpenBSD 6.8 I was experiencing the following scenario:

I placed the cert into the .znc directory and added it to the config:

SSLCertFile = ~/.znc/cert1.pem 

znc kept nagging that it wasn't able to locate the pem file, even though the same user it was running under could cat the file. At some point I didn't know what to do and simply pressed y, to let znc create the pem. This was what it resulted in:

Screen Shot 2021-04-01 at 6 12 34 PM

Turns out that in my config I had a whitespace at the end of the certificate name. I kid you not. My config was SSLCertFile = ~/.znc/cert1.pem instead of SSLCertFile = ~/.znc/cert1.pem. Hence znc wasn't unable to find the pem and so it created a file that had a whitespace at the end of its name.