maptiler / tileserver-gl

Vector and raster maps with GL styles. Server side rendering by MapLibre GL Native. Map tile server for MapLibre GL JS, Android, iOS, Leaflet, OpenLayers, GIS via WMTS, etc.
https://tileserver.readthedocs.io/en/latest/
Other
2.17k stars 630 forks source link

Docker: "Startup complete" fails for unknown reason after SIGHUP #1383

Open larsschwarz opened 1 week ago

larsschwarz commented 1 week ago

I'm running the dockered version and update some mbtiles files frequently which requires a SIGHUP to reload the server.

This works fine for like 2 days, but after like 2 days the server becomes unavailable. The log always looks like this

Caught signal SIGHUP, refreshing
Stopping server and reloading config
Starting server
Listening at http://[::]:8080/
Startup complete
Caught signal SIGHUP, refreshing
Stopping server and reloading config
Starting server
Listening at http://[::]:8080/

As you can see "startup complete" isn't logged in those cases.

Afaik there is no more verbose mode available, so I have no idea how to debug this issue. On a side note I was wondering why there are no timestamps printed out. Without them it's even harder to check other logs for issues occured at the same time that might be related.

Happens with Docker 27.1.2, build d01f264 running Ubuntu 22.04.4 LTS.

acalcutt commented 1 week ago

Which version of TileServer-GL? Is there an version that was working for you? "Startup complete" seems to come from https://github.com/maptiler/tileserver-gl/blob/master/src/server.js#L615 and "Listening at" comes from https://github.com/maptiler/tileserver-gl/blob/master/src/server.js#L635 .

It looks like startup complete get show when all the promises are loaded, so it seems like maybe something isn't completing. not sure what that would be....

--verbose is still an option, but it isn't always that helpful. I would welcome a PR to improve it's usefulness. Usually in my own testing I usually start throwing in console.logs to see what values things are actually getting, but I usually have so many they wouldn't be useful as verbose because it gets hard to follow.

Timestamps would be a good addition

larsschwarz commented 1 week ago

Version is v4.12.0, haven't tried the latest yet. Haven't tried any previous versions before 4.1.2.0 either.

Seems like the health check obviously fails afterwards, however Docker does not seem to restart the container even though I tried using --restart unless-stopped to circumvent the issue.

Guess I have to look into adding a few more debug logs in server.js then, unfortunately it runs just fine for approximately two days with a SIGHUP reload every 15 minutes so it's a little bit hard to debug this specific issue 😩

acalcutt commented 1 week ago

Yes I hate bugs like that. if you are restarting every 15 minutes you would think you would see that more often... maybe something like memory filling up in the docker image? I had read an issue somewhere that the SIGHUP restart didn't seems like it was releasing memory.

acalcutt commented 1 week ago

I would also see if 4.13.3 or 5.0.0 behave differently, since there have been a few bug fixes relating to fonts in 4.12.0 which caused a bit of stangness rendering for me

larsschwarz commented 1 week ago

Thanks Andrew. Guess I switch back to non-dockered first and see how that works. Btw: Is there a "recommended" way to reload the non-dockered server version like a build in reload functionality?

acalcutt commented 1 week ago

In my server I set up some systemd services for Xvfb and Tileserver https://github.com/acalcutt/wifidb-tileserver-gl/tree/master/tileserver-gl/systemd

These notes are a bit outdated node version wise, but pretty much how I set it up https://github.com/acalcutt/wifidb-tileserver-gl/blob/master/tileserver-gl/Notes.txt

Then I have a nightly process that just restarts it with 'systemctl restart Tileserver.service' when new files get copied over, though I think SIGHUP should still be possible also.

acalcutt commented 1 week ago

I think SIGHUP without docker is basically sending HUP kill to the node process. I tested this and it seems to work kill -s HUP $(ps aux | grep "[n]ode" | awk '{print $2}')

image

Also, I notice in the systemd logs I do get a timestamp.

okimiko commented 1 week ago

I'm using SIGHUP in a dockered environment, too, (since version 4.4.x, up to the latest 5.0.0) and never had this kind of issue. I'm not reloading in an interval, but on demand to update the styles. So may be the question may be related to the files, which have been changed in your setup? Is there any cronjob running? Are all changes flushed? Are there bigger updates after two days (or at a specific time)? Are you using docker volumes? May be even the filesystem is involved.