hakwerk / labca

A private Certificate Authority for internal (lab) use, based on the open source ACME Automated Certificate Management Environment implementation from Let's Encrypt (tm).
https://lab-ca.net
Other
356 stars 39 forks source link

LabCA not starting properly #60

Closed radicalrabbit closed 1 year ago

radicalrabbit commented 2 years ago

Hello @hakwerk,

first of all: Thank you for providing this amazing project and thank you for working on it. I would be very happy to get some help with the following issue. In my case the installation of LabCA finished and all containers are up and running. I was able to set username, password and create the root certificates. Nevertheless LabCA is trapped in some kind of endless loop saying "Restart - Almost there. Now we will request a certificate for this website and restart one more time...".

1

I checked all the logs as you suggested in https://github.com/hakwerk/labca#troubleshooting. The acme_tiny.log is missing somehow:

2

Boulder control is continously trying to fetch some packages:

3

And boulder is complaining about an error starting a publisher service:

4

Any help and/or advice is highly appreciated!

Cheers Alex

tvontheinternet commented 2 years ago

Hello, I too am experiencing the same behavior on a new install on a debian 11 VM.

I also do not see an acme_tiny.log anywhere.

My boulder and boulder-control logs are a bit different, attaching.

boulderLog.log control.log

Additionally, the boulder-control-1 container is continually restarting.

boulder

I have tried rerunning the install script, which completes without error but results in the same behavior.

tvontheinternet commented 2 years ago

I do see a lot of this in boulder-labca-1 logs:

boulder-labca-1 | 2022/10/03 03:21:06 errorHandler: err=dial tcp: lookup control on 127.0.0.11:53: no such host

hakwerk commented 2 years ago

I think this issue is now resolved in the latest v22.10 release. It looks like something changed when installing the docker-ce package.

By the way, the reason that the acme_tiny.log file does not exist yet is because the setup is unable to reach to point where that file gets created...

radicalrabbit commented 2 years ago

Hi @hakwerk,

I did both a reinstall on the existing installation and a fresh install on a Debian 11. Still not able to use the web application. After logging in it says "Almost there! Now we will request a certificate for this website and restart one more time...". Containers are in state running. But the bmysql log logs some connection errors:

1 2 3

hakwerk commented 2 years ago

I'm afraid there is not much I can tell from the first bmysql log, these aborted connection messages sometimes occur when the boulder container is restarting, but normally not every minute. Please have a look at the boulder-boulder-1 logs - it looks like that container is actually crashing over and over so hopefully there is a clue in that log. Do a docker ps -a to see if the STATUS and CREATED values match - if the status is very recent then docker-compose just restarted the container.

A quite common issue is that the boulder container cannot resolve the IP for the FQDN of your LabCA server.

The slow_log errors might be caused by out of memory, disk space etc...

tvontheinternet commented 2 years ago

I created a fresh Debian 11 VM, and ran the installer after the release of v22.10, but am seeing the same behavior.

I then rolled that new VM back to a snapshot taken after debian install, before installing labca and tried again, same behavior.

The VM OS is able to resolve its own FQDN.

The web page behavior is the same, "almost there, now we will request a certificate for this website and restart one more time"

The boulder-control-1 container running control.sh is constantly restarting

image

From the logging there, that container is having trouble with dpkg.

I did play around and set the containers restart policy to no, started it with docker-compose up -d connected to it with docker exec -it bash, and ran apt update and install commands without issue.

Logs for all 5 containers attached.

boulder.log control.log labca.log mysql.log nginx.log

If the control container that is constantly restarting is the right place to look. "boulder-control-1 | ./control.sh: line 97: docker: command not found" is logged, but I think thats normal since

docker ps >/dev/null won't redirect stderr (docker ps >/dev/null 2&>1 would), but then the install_docker runs apt-update and dpkg complains.

Could be the totally wrong tree there though. And I did get that container started forcefully, connected to it and installed docker without error. (on previous attempt, leaving this one as is)

mwstamant commented 2 years ago

@radicalrabbit I can confirm I am experiencing the same symptoms on a Debian 11 base image on Azure.

tntounis commented 2 years ago

I managed to move forward with the setup. I added in the /home/labca/boulder/docker-compose.yml file one volume in control image section. - /etc/resolv.conf:/etc/resolv.conf This mapped the DNS servers from the host to the docker container. I restarted using docker-compose up -d. After a while, I receive an error message 'curl: (6) Could not resolve host: boulder'. I removed the entry from docker-compose.yml file. Restarted again the containers and the installation finished.

tvontheinternet commented 2 years ago

I managed to move forward with the setup. I added in the /home/labca/boulder/docker-compose.yml file one volume in control image section. - /etc/resolv.conf:/etc/resolv.conf This mapped the DNS servers from the host to the docker container. I restarted using docker-compose up -d. After a while, I receive an error message 'curl: (6) Could not resolve host: boulder'. I removed the entry from docker-compose.yml file. Restarted again the containers and the installation finished.

That is interesting, boulder-control is supposed to be connected to the network boulder-bluenet:

` control: image: *boulder_image networks:

As are all other containers (I think control is not listed cause its perpetually restarting on my system):

docker network inspect boulder_bluenet | grep Name "Name": "boulder_bluenet", "Name": "boulder-nginx-1", "Name": "boulder-boulder-1", "Name": "boulder-labca-1", "Name": "boulder-bmysql-1",

and the other containers on the network have internal dns, connected to boulder-labca, installed ping, and was able to ping the local host by FQDN. boulder-labca is not connected to rednet and bluenet either, just bluenet.

/etc/resolv.conf on boulder-labca:

root@ba37f499208b:/go/src/labca# cat /etc/resolv.conf search localdomain tvent.local nameserver 127.0.0.11 options ndots:0

I can try to replicate your fix later on, I just don't get why boulder-control would have dns trouble while the other containers don't.

tvontheinternet commented 2 years ago

I did not replicate @tntounis ' steps exactly but I think I was able to get up and running and all I did of consequence I believe was force the control container to stay started.

I ran docker update --restart no (control container) docker-compse up -d

The UI eventually threw a generic something went wrong, here are some error logs to look at I connected to the container with docker exec -it bash and was messing around installing ping/checking DNS and what not, and eventually just kind of noticed the labca page had updated to look as if the install was complete

I had to add the root CA to firefox for firefox to stop complaining about unknown issuer for the labca apps site, after it successfully grabbed a certificate this time looks like. Chrome and Edge did not need anything after adding the Root CA to the local computers trusted root certificate store.

tntounis commented 2 years ago

@tvontheinternet I don't understand why this worked as a workaround. But the control container was rebooting all the time. After the change, I got the error message that it couldn't find bounder. So reverting the entry worked for me. The control container was not crashing anymore, and I noticed that it restarted bounder container after a while. I refreshed the page, I could see the dashboard and the new certificate. Moreover, I noticed in the logs that the ACME request happened 1-2 minutes before it finished the setup. I believe the setup process didn't request a certificate.

hakwerk commented 1 year ago

I think the issue is now resolved in the latest version (v22.10.1), it appears that my installer was restarting the control container while it was still building. For me it only happens in 10% or so of fresh installations, so it is very hard to confirm if the problem is indeed gone now.

In general, if people still have issues like this, it is safe to remove the hanging/restarting container and rebuild it:

cd /home/labca/boulder
docker-compose stop control
docker-compose rm -f control
docker-compose up -d

Give it a few minutes and the container should be up and the web GUI should work.

hakwerk commented 1 year ago

The latest release (v22.10.2) should further improve any DNS related issues in my experience

shizzgar commented 1 year ago

I found out what was the problem. I am running my own dns server in docker, so this solution resolved (lol) dns conflicts... https://github.com/pi-hole/docker-pi-hole/issues/1166#issuecomment-1475374876

By the way, your work is awesome, thank you very much!