Closed JT-theDM closed 8 months ago
Hey @JT-theDM, thanks for the incredibly detailed report! That's all very helpful.
It does sound like the changes for the certificate fix are causing set -e
(which tells bash to exit when any command has an error) to exit the script early. An improvement here would be to add a trap
to the script so it can output the error. It's odd that none of the commands are outputting an error of their own, though.
@haozhun would you mind taking a look at this as you said you tested the changes? Thanks!
In my original PR, I mentioned that I've verified that the fix is idempotent.
However, in reality, I did actually run into one issue when I repeatedly ran the steps. I couldn't reproduce it, so I chalked it up to me accidentally doing something stupid. However, it looks like that is not the case, considering that this report matched exactly the error that I ran into.
The issue is that /usr/local/share/ca-certificates/ directory somehow does not exist. As a result, curl
fails.
It's odd that none of the commands are outputting an error of their own, though.
Indeed. Your commend reminded me to take a look at the curl
options. Normally, one would do -sS
in a script, so that curl
is mostly silent if everything goes smoothly, but still print errors if something bad happens. In this case, I copy-pasted the command from the https://community.ui.com/questions/Fix-Solution-Lets-Encrypt-DST-Root-CA-X3-Expiration-Problems-with-IDS-IPS-Signature-Updates-HTTPS-E/0404a626-1a77-4d6c-9b4c-17ea3dea641d, and I didn't think about it. In this case, curl
is ran with -s
, so it's completely silent, which is a bad idea in general, and especially so in a script.
As I set out to change -s
to -sS
, I realize that the command is ran with -k
(short for --inscecure
). This bypasses https cert validation. That is a terrible terrible thing to put in this repo. It is understandable why it's there in the first place. The letsencrypt.org domain uses a TLS cert that is recursively signed by ISRG_Root_X1, therefore we need to break the circular dependency somehow. And adding -k
is the simplest solution. Nevertheless, it is terrible.
Therefore, I request that you revert my previous commit.
I can think of two paths forward here:
Thank you very much for this detailed analysis!
You're right: -k
is the simplest, but it's also terrible.
I think it would be acceptable to have a copy of the ISRG X1 root certificate in this repo and install it that way. It's a public certificate so it's easily verifiable.
Tried to install today to EdgeRouter 4 with EdgeOSv2.0.9-hotfix.7
Changed the curl command to this:
curl https://letsencrypt.org/certs/isrgrootx1.pem --create-dirs -o /usr/local/share/ca-certificates/ISRG_Root_X1.crt
Installed successfully.
I removed set -e
from and it continued successfully. curl seems to work fine but returns non-zero, so the rest of execution is stopped.
Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?
In case anyone's still got a fresh enough system that can exhibit the problem can you help me verify if the curl is even needed?
On my ER4 it seems like the ISRG Root X1 cert is already present:
root@gw02:~# ls -l /usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt
-rw-r--r-- 1 root root 1939 Jun 5 2020 /usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt
And, as far as I can tell on this system (which is not a clean system), the sed
and update-ca-certificates
commands are enough, without fetching the newer X1 cert. On this system, the cert in /usr/share
is identical to the one downloaded with curl.
If we can get away with removing the curl command from the setup script that will fix this.
I have that file as well
Linux EdgeRouter-4 4.9.79-UBNT #1 SMP Thu Jun 15 11:34:36 UTC 2023 mips64
Welcome to EdgeOS
Last login: Tue Jan 23 13:48:02 2024 from 192.168.1.45
admin@EdgeRouter-4:~$ ls -l /usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt
-rw-r--r-- 1 root root 1939 Jun 5 2020 /usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt
Notice also that I don't have the file specified in the curl download:
admin@EdgeRouter-4:~$ ls /usr/local/share/ca-certificates/ISRG_Root_X1.crt
ls: /usr/local/share/ca-certificates/ISRG_Root_X1.crt: No such file or directory
I guess curl failed because the folder /usr/local/share/ca-certificates
doesn't exist.
Removed the curl completely and worked fine. EdgeRouter X v2.0.9-hotfix.7
Thanks for that confirmation. I've committed a change that removes the curl.
Not sure if this was just my issue but I wasn't able to install tailscale following your instructions. I believe I followed the instructions accurately up through running the first script at \config\scripts\firstboot.d\tailscale.sh when i tried to run the next script I got a "no such file or directory" error.
Router: EdgeRouter X OS: EdgeOS v2.0.9-hotfix.7 notes: this is the first thing I did after factory reset + config via the basic setup wizard. connected over SSH from a windows11 cmd line.
Steps to replicate:
I was able to get this installed after some troubleshooting. I think the cause for the failure has to do with the certificate fix added in the last update to the script.
I started troubleshooting by trying to verify that the \config\tailscale\ path was created, which is one of the first step after the certs fix and the easiest thing to verify that any part of the script was successful The tailscale path had not been created despite not getting an error when running \config\scripts\firstboot.d\tailscale.sh.
Next I decided to run certificate fix outside of the script to see if I got an error and everything worked as it should despite the lines being identical.
Then I edited \config\scripts\firstboot.d\tailscale.sh to comment out the certificate fix section. the only other edit i made on the file was to change "set -e" to "set -x" (the reason for that is I've got no real experience with shell and based on some quick + lazy googling I was hoping that would let me see more info on the output of the script so I could look for any errors)
start of the file after the edit was
the rest of the file was unchanged.
Running the script again after those edits it worked the first time I ran it so unless nobody else can replicate this issue i think there may be an issue with the new section of the tailscale script
Please let me know if anyone is able to recreate this issue. attaching the SSH lines if anyone feels the need to check it for an embarrassing mistake on my part