Open MathewBiddle opened 2 months ago
maybe live restore??
Okay, testing live-restore
:
$ more /etc/docker/daemon.json
{
"live-restore": true
}
$ sudo systemctl start docker
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ec3b94b319fe axiom/docker-erddap:2.23-jdk17-openjdk "/entrypoint.sh cata…" 7 days ago Up 5 seconds 0.0.0.0:80->8080/tcp, :::80->8080/tcp, 0.0.0.0:443->8443/tcp, :::443->8443/tcp erddap_gold_standard
I will check back in a few weeks to see if this fixes the issue. Luckily we have plenty of checks hitting this server, so we will know quickly when it breaks.
To confirm the change was accepted:
$ docker info | grep Live
Live Restore Enabled: true
Do you have access to the docker daemon logs? Also what are the docker and kernel versions?
Do you have access to the docker daemon logs?
I have access to /var/log
which has a few messages
files. I think those are the logs as documented here.
Also what are the docker and kernel versions?
$ docker --version Docker version 20.10.25, build b82b9f3 $ uname -sr Linux 5.10.210-201.852.amzn2.x86_64
Live Restore seems to be working. From status:
Current time is 2024-05-06T15:44:10+00:00
Startup was at 2024-04-17T13:24:27+00:00
I'll keep this open until 2 months have passed without the daemon crashing.
Boo... looks like it crashed again.
$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Restarted with:
/usr/local/erddap-gold-standard$ sudo systemctl start docker
/usr/local/erddap-gold-standard$ docker-compose restart
Restarting erddap_gold_standard ... done
/usr/local/erddap-gold-standard$ docker info | grep Live
Live Restore Enabled: true
Boo... looks like it crashed again.
Same frequency as before, sooner, or later? We need to inspect the logs here to see if we can understand what is going on.
much later - almost 3 months vs a few weeks. I looked at the logs and they are gobbledygook to me 😵
much later - almost 3 months vs a few weeks. I looked at the logs and they are gobbledygook to me 😵
Well, maybe that is a (small) win. I never looked into ERDDAP logs, we should probably ask for help here from the experts (Ben, Chris, Shane).
When running this erddap-gold-standard on AWS, every few weeks the docker daemon for the erddap-gold-standard docker deployment crashes.
It's a simple fix to get it up and running again using:
I'm curious if other folks have experienced this before with an ERDDAP deployed using Docker on AWS??
I've discussed with @patrick-tripp and the current work around would be to set a cronjob to check the url, if it fails, restart docker.
cc: @mwengren, @ocefpaf, @patrick-tripp.