mmmaxwwwell / space-engineers-dedicated-docker-linux

Space Engineers Dedicated Server running in Docker for Linux
MIT License
175 stars 42 forks source link

Massive Lag Spike on Autosave #38

Closed Treep3r closed 4 months ago

Treep3r commented 2 years ago

So I've playing on a server using this image and at some point I started to notice regular lag spikes. They appeared quite periodically so it was rather obvious that they were caused by the autosaves. It got worse and worse over time, to the point where the server just halts for like 2-3 seconds on each autosave.

Now I assume they got worse cause the save got bigger, but is this really normal behaviour? I'm running it on a dedicated Debian root server with an AMD 3900, 128GB of ram and SSD storage only. I can only imagine the CPU being somewhat of a bottleneck in this case, but an autosave still shouldn't cause that much lag.

locutus13 commented 2 years ago

I can confirm that saving slows down the game considerably 1.0 simspeed goes down to 0.2 for a few seconds and sometimes I get to a "connection problem" pause... When my game was fresh it was barely noticeable. Now that the game has ~200MB filesize it becomes unbearable.

I was checking the server performance with htop and iotop: File-Saving happens at 50MB/s on an SSD and CPU load goes up to approx 300% (on a 12core system).

I am using an AMD 3600 with 64GB RAM - dedicated root server too.

Next step might be to try putting the world on a ramdrive and see if that suffers from the same issues...

UPDATE: As this was quite easy to do I can confirm that putting the world on a ramdrive shortens the slowdown a bit - but does not change the "low sim speed" spike to 0.2

UPDATE 2: Testing some more with more grids using admin commands the save is now 500MB in size and as long as I don't save (or it autosaves) it is quite playable (0.8 - 0.9 simspeed). But during those saves it stalls too much. Almost every time I get a "connection problem" now ...

mmmaxwwwell commented 2 years ago

@locutus13 could you try swapping the bind mount to a volume mount, and see if it makes a difference? Ramdisk should be instant, and if you're still seeing a lag spike, I'd bet docker has something to do with it. I'm researching that theory over here, and I'm finding some anecdotal evidence saying bind mounts (the ones specified in the docker-compose.yml file) are slow on some platforms.

Another thing that I discovered recently is the reverse proxy docker uses to expose container ports is slow as well. Try setting this under the se-server section in docker-compose.yml: network_mode: "host". Its a little less secure, but should be more performant.

locutus13 commented 2 years ago

All of your suggestions are already part of my docker-compose.yml ... I added network_mode: "host" at the very beginning to improve overall performance (not while saving) and the mountpoints are all volumes - as is the default in the current main branch.

One thing I have noticed is that the host performance is critically bad (up to getting unreachable via SSH for some time) while the integrated steam client downloads anything... or saves anything to the disk ... no indication in syslog or docker logs what is going wrong there ... (or maybe I am just blind)

mmmaxwwwell commented 2 years ago

@locutus13 check this article out, it covers the difference between volume and bind mounts.

https://blog.logrocket.com/docker-volumes-vs-bind-mounts/

The ones in the default docker-compose.yml are bind mounts, even though they are listed in the volumes section, both bind and volume mounts go there.

If your're unsure how to proceed, I can create a branch that starts a new world up using the docker volume mounts instead of bind mounts. I'm busy this week, but this weekend I can spend some time getting it ready, then provide you with instructions on how to copy your world into the volume mounts.

Your statement about the whole machine being slow while downloading the bins makes me think we're onto something here.

locutus13 commented 2 years ago

Sorry, I am a little inexperienced with docker - but google is my friend.... I am guessing for volume mounting it should look like this:

` version: '3'

services: se-server: image: mmmaxwwwell/space-engineers-dedicated-docker-linux:latest container_name: space-engineers-dedicated-docker-linux restart: unless-stopped network_mode: "host" volumes:

EDIT: well ... the indentation got lost ...

mmmaxwwwell commented 2 years ago

Hey, don't apologize! I appreciate you taking the initiative to learn!

Check this doc out: https://docs.docker.com/compose/compose-file/compose-file-v3/#volume-configuration-reference

Basically, there's 3 steps you need to do: 1 add the volume definitions. It's another entry in the docker-conpose.yml file at the same level as the services block.

2 update the mounts. The volume mounts for the container are in the format "hostpath:containerpath". When you create the volume definitions, you will name them. You will have to update the mounts to this format "volumename:containerpath"

3 copy the world into the new volume mounts. This is a little tricky, and there are a few ways to do it. You could just copy data into the mount path under /var/lib/docker/, or you could spin up an ephemeral container with the old world and new volume mounted, with a command to copy from the old path to the new path.

I can help at the end of the week, a bit busy in the first half.

locutus13 commented 2 years ago

Thanks for the assistance ... I have prepared a docker-compose.yml to the best of my abilities and it does not produce an error when I execute

sudo docker-compose pull

Had to update to version 3.2 to use that formatting that I was trying above...

I also added the volume definitions - all I need is to test it now.... but the server is currently in use for some other ppl for playing a different game and from the past I have learned that things might crash and/or stall on SE startup...

So I will continue later or tomorrow evening (GMT+2 timezone here) ... Thanks again for your support.

locutus13 commented 2 years ago

well I think this may get a little too much over my head ... so many directories - so many mountpoints ... it started - briefly and then crashed so I tried removing the binaries to force a redownload -> this worked and downloading the client did not stall the server (good sign so far)

Guess my mountpoints are wrong or there is something hardcoded in the entrypoint script ... It has time - enjoy easter this weekend and respond whenever you get around to changing this to volume mounts ...

Thanks again

Neshura87 commented 1 year ago

Experienced the same issue (on a SSD to boot so it's definitely not a write speed issue), mounting the save via a volume rather than a bind mount has resolved the issue so far (only checked during 3 autosaves so far)

Only mounted the Save as a volume and nothing else since I doubt the lag comes from the binary itself, server had a couple "No IP" Errors after restarting the docker image but that goes away after 3-4 attempts

Edit: I created the volume manually and then listed it as an external volume in the compose file, no idea if the proposed format would work, just that volumes apparently fix the issue

mmmaxwwwell commented 4 months ago

Based on the feedback and the learnings in this thread and help from other developers I've created a v2 branch which switches from the bind mounts to volume mounts. Performance seems improved. Closing this. Thank you all for your contributions.