Open boldtrn opened 6 months ago
I am not aware of any issue that would cause that issue, but can you see if the just release 4.11.0 helps.
We had to revert to 4.5.1 for now. I will give this another try soon :)
Where are you running the command above, inside the docker image or on the host the docker is running on? Does it take time to build up like that?
I ran this on the host. I believe this might have been Docker processes that were killed or anything like this, as the user is systemd+ and Docker is managed by systemd, but this only an idea at this point.
I can verify that the error still persists with the latest release. I can't see anything obvious in the logs, but I have to admit it's a production system, so there are a lot of logs. If you have a possible hint what to search for in the logs I can give this a try. Obvious stuff like ERROR or FATAL did not show anything interesting.
Unfortunately I don't have any good answers on what to look for. If i had to guess it would be a rendering issue, since that starts it's own threads. I find when maplibre-native as an issues, it doesn't always give back an error.
When I am troubleshooting stuff like that I try to find a url that isn't loading as expected. I then test that url in a more contollable instance. usually in testing I uncomment https://github.com/maptiler/tileserver-gl/blob/master/src/serve_rendered.js#L874 to get an idea what is being loaded when maplibre-native fails.
Have you seen anything that is failing to load with the new version? you were using static images right?
Have you seen anything that is failing to load with the new version? you were using static images right?
We are using raster and vector tiles as well as static images. I haven't seen anything failing, we are serving several million tile requests per day, so it's hard to track down isolated issues. We had some performance issues but I doubt these are related to the version. We are currently running different version of tileserver-gl and CPU etc. usage look somewhat similar (actually the latest version seems to be about 5% less resource consuming)
Just an FYI, i did find an issue in the docker build caused by the change to use "is-ci" when dev utils were not included. I put that back to the old method in https://github.com/maptiler/tileserver-gl/pull/1250 . That should be fixed in 4.11.1
I'm not sure it has anything to do with your issue, but i thought it could be a possibility
I will give this a try, thanks 👍
Ok, I think the latest update 4.11.1 did indeed fix the zombie processes, I haven't seen them since. Thanks for looking into this @acalcutt 👍
Unfortunately, I have to reopen this issue. Zombie processes just reappeared yesterday on one of our servers. The container even went down and we had to restart it. Again the logs did not show anything new.
We recently updated from 4.5.1 to 4.10.3. After the update we have seen quite some performance issues with our tile server. One thing that stands out to me is that we are getting zombie processes. We are using the Docker image.
The zombie processes are node commands apparently, so maybe there was an issue introduced along the way?