lloesche / valheim-server-docker

Valheim dedicated gameserver with automatic update, World backup, BepInEx and ValheimPlus mod support
https://hub.docker.com/r/lloesche/valheim-server
Apache License 2.0
1.94k stars 272 forks source link

Saving causes server to disconnect, sometimes lost progress? #84

Closed nodu closed 3 years ago

nodu commented 3 years ago

Periodically we disconnected after a blinking connection icon in the top left. Sometimes we seem to lose progress in the world.

No clue what it was, did a bit of debugging, now I can manually reproduce the issue by opening the console, manually saving, client disconnects.

Container: latest/49c71d4e4253

See attached logs.

02/22/2021 23:27:06: clone 1018

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:10: Saved 331406 zdos

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:10: World saved ( 4211.948ms )

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

/usr/local/bin/valheim-server: line 28:  2201 Killed                  "$valheim_server" -nographics -batchmode -name "$SERVER_NAME" -port $SERVER_PORT -world "$WORLD_NAME" -password "$SERVER_PASS" -public $SERVER_PUBLIC
2021-02-22 23:27:42,916 INFO exited: valheim-server (exit status 0; expected)
2021-02-22 23:27:43,948 INFO spawned: 'valheim-server' with pid 2372
Running Valheim Server
Found path: /opt/valheim/valheim_server.x86_64
Mono path[0] = '/opt/valheim/valheim_server_Data/Managed'
Mono config path = '/opt/valheim/valheim_server_Data/MonoBleedingEdge/etc'
Preloaded 'libsteam_api.so'
Initialize engine version: 2019.4.20f1 (6dd1c08eedfa)
[Subsystems] Discovering subsystems at path /opt/valheim/valheim_server_Data/UnitySubsystems
Forcing GfxDevice: Null
GfxDevice: creating device client; threaded=0
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies
Begin MonoManager ReloadAssembly
- Completed reload, in  0.069 seconds
UnloadTime: 1.211916 ms
02/22/2021 23:27:44: Starting to load scene:start

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Couldn't create a Convex Mesh from source mesh "RearBig" within the maximum polygons limit (256). The partial hull will be used. Consider simplifying your mesh. 
(Filename:  Line: 91)

2021-02-22 23:27:45,455 INFO success: valheim-server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
HDR Render Texture not supported, disabling HDR on reflection probe. 
(Filename:  Line: 277)

HDR Render Texture not supported, disabling HDR on reflection probe. 
(Filename:  Line: 277)

Unloading 6 Unused Serialized files (Serialized files now loaded: 0)
UnloadTime: 5.141160 ms
Only custom filters can be played. Please add a custom filter or an audioclip to the audiosource (Amb_MainMenu). 
(Filename:  Line: 509)

HDR Render Texture not supported, disabling HDR on reflection probe. 
(Filename:  Line: 277)

02/22/2021 23:27:54: Initializing world generator seed: ( 0 )   menu:True  worldgen version:1

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Using mountain distance: 1000

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Render threading mode:SingleThreaded

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Get create world xxxxxxxxx

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Using environment steamid 892970

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Using steam APPID:xxxxxxxxx

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

CAppInfoCacheReadFromDiskThread took 3 milliseconds to initialize
CApplicationManagerPopulateThread took 0 milliseconds to initialize (will have waited on CAppInfoCacheReadFromDiskThread)
RecordSteamInterfaceCreation (PID 2373): SteamGameServer012 / GameServer
RecordSteamInterfaceCreation (PID 2373): SteamUtils009 / Utils
Setting breakpad minidump AppID = xxxxxxxxx
RecordSteamInterfaceCreation (PID 2373): SteamGameServer012 / GameServer
RecordSteamInterfaceCreation (PID 2373): SteamUtils009 / Utils
RecordSteamInterfaceCreation (PID 2373): SteamNetworking005 / Networking
RecordSteamInterfaceCreation (PID 2373): SteamGameServerStats001 / GameServerStats
RecordSteamInterfaceCreation (PID 2373): STEAMHTTP_INTERFACE_VERSION002 / HTTP
RecordSteamInterfaceCreation (PID 2373): STEAMINVENTORY_INTERFACE_V002 / Inventory
RecordSteamInterfaceCreation (PID 2373): STEAMUGC_INTERFACE_VERSION010 / UGC
RecordSteamInterfaceCreation (PID 2373): STEAMAPPS_INTERFACE_VERSION008 / Apps
02/22/2021 23:27:54: Server ID 123

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Steam game server initialized

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

02/22/2021 23:27:54: Missing audio clip in music respawn
nodu commented 3 years ago

Any assistance here?

brcoding commented 3 years ago

I have seen the same issue, but it is always isolated to a single player. On voice we can determine that only one person has an issue.

On Wed, Feb 24, 2021 at 2:39 PM Matt Nodurfth notifications@github.com wrote:

Any assistance here?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lloesche/valheim-server-docker/issues/84#issuecomment-785431828, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABELGOSX34JGI75AEX6PNDTAV53VANCNFSM4YBOG34A .

nodu commented 3 years ago

Happened several times: Disconnected while finding body, re-connect, world is reset body/loot lost 😢

lloesche commented 3 years ago

Never seen this or had any blinking connection icon or disconnects. Regular restarts will save the world. You should be able to restore the world from the hourly backups. The player inventory is stored client side so that is lost if the player hasn't created a local backup.

johnposten commented 3 years ago

We noticed this last night and got stranded in the middle of the ocean with no boat because of it.

My log output is very similar (although I don't have it handy right now-- I will try and update later). I'm hosting this as an Azure Container Instance, with /config volume mounted to a Azure File storage account

My gut feeling is that it seemed like it was related to the frequent server restarts, and it doesn't seem to save the last 5 minutes or so of progress before the restart in our case.

lloesche commented 3 years ago

seemed like it was related to the frequent server restarts

@johnposten can you elaborate on this? I do not have frequent restarts on my server. By default there should be one scheduled restart once per night at 5am and then whenever a server update is available which is currently every couple of days. If you see more than like two restarts in a day something is wrong.

Also, during a regular restart the server is shut down gracefully and will 100% save the world. No 5 minutes lost.

That said, if the server crashes for whatever reason it will automatically get restarted and would likely not have saved the past couple of minutes.

Can you gather from your logs the reason for the restarts? Also how much core memory are you allowing the server to use? On my system the idle(!) server currently consumes 3 GB RSS and 10 GB VSZ.

johnposten commented 3 years ago

Ahh, very interesting. Thanks for the summary of info, sorry I haven't had much time to get fully acquainted with your project yet, I just discovered this gem yesterday :)

Perhaps it has to do with how I deployed the ACI. I'll get more detailed information later tonight.

Update: Server didn't crash last night at all, of course I was too scared to try sailing again.

Azure container memory chart is showing about 1.8gb used memory. Here are the ps stats:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        12  0.0  0.0   6728  3356 ?        S    Feb26   0:00 /bin/bash /usr/local/bin/valheim-backup
root     29741  0.0  0.0   6860  3536 ?        S    13:54   0:00 /bin/bash /usr/local/bin/valheim-updater
root     30896  0.0  0.0   6860  3480 ?        S    14:36   0:00 /bin/bash /usr/local/bin/valheim-server
root     30899 16.1 19.7 9877784 3246044 ?     Sl   14:36   4:20 /opt/valheim/server/valheim_server.x86_64 -nographics -batchmode -name something -port 2456 -world something -password notrealpass -public 0
root     30901  0.0  0.0   6860  1820 ?        S    14:36   0:00 /bin/bash /usr/local/bin/valheim-server
root     30902  0.0  0.0   6860  1908 ?        S    14:36   0:00 /bin/bash /usr/local/bin/valheim-server
root     31682  0.0  0.0   3084   824 ?        S+   15:03   0:00 grep valheim

Here is a previous crash log from the day before, there were a couple of these within a 2hr window, which aligns with my experience of the previous day.

Waiting for user info...OK
 Update state (0x5) verifying install, progress: 0.60 (6291456 / 1051292475)
 Update state (0x5) verifying install, progress: 40.68 (427656084 / 1051292475)
 Update state (0x5) verifying install, progress: 80.34 (844558468 / 1051292475)
Success! App '896660' fully installed.
.d..t...... ./
Valheim Server is already the latest version
02/27/2021 09:37:47:  Connections 0 ZDOS:1225829  sent:0 recv:0
Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Unloading 0 unused Assets to reduce memory usage. Loaded Objects now: 1116779.
Total: 1050.388500 ms (FindLiveObjects: 69.768800 ms CreateObjectMapping: 16.512500 ms MarkObjects: 962.328700 ms  DeleteObjects: 1.774100 ms)
Backing up Valheim server worlds to /config/backups/worlds-20210227-094136.zip
  adding: worlds/ (stored 0%)
  adding: worlds/Dedicated.db (deflated 51%)
  adding: worlds/Dedicated.fwl (deflated 4%)
  adding: worlds/Dedicated.fwl.old (deflated 4%)
  adding: worlds/something.db (deflated 69%)
  adding: worlds/something.db.old (deflated 69%)
  adding: worlds/something.fwl (deflated 4%)
  adding: worlds/something.fwl.old (deflated 4%)
Removing backups older than 3 days
Waiting 3600 seconds before next backup run
02/27/2021 09:47:48:  Connections 0 ZDOS:1225829  sent:0 recv:0
Unloading 0 Unused Serialized files (Serialized files now loaded: 0)
Unloading 0 unused Assets to reduce memory usage. Loaded Objects now: 1116779.
Total: 994.524000 ms (FindLiveObjects: 64.099800 ms CreateObjectMapping: 15.039000 ms MarkObjects: 913.681000 ms  DeleteObjects: 1.699100 ms)
02/27/2021 09:47:51: clone 1870
2021-02-27 09:48:29,074 INFO reaped unknown pid 21773
2021-02-27 09:48:29,074 INFO reaped unknown pid 21776
/usr/local/bin/valheim-server: line 65: 21772 Killed                  "$valheim_server" -nographics -batchmode -name "$SERVER_NAME" -port $SERVER_PORT -world "$WORLD_NAME" -password "$SERVER_PASS" -public $SERVER_PUBLIC > >(filter stdout) 2> >(filter stderr >&2)
2021-02-27 09:48:29,074 INFO exited: valheim-server (exit status 0; expected)
2021-02-27 09:48:29,074 INFO reaped unknown pid 21777
2021-02-27 09:48:29,074 INFO reaped unknown pid 21775
2021-02-27 09:48:29,076 INFO spawned: 'valheim-server' with pid 22890
Running Valheim Server
Found path: /opt/valheim/server/valheim_server.x86_64
Mono path[0] = '/opt/valheim/server/valheim_server_Data/Managed'
Mono config path = '/opt/valheim/server/valheim_server_Data/MonoBleedingEdge/etc'
Preloaded 'libsteam_api.so'
Initialize engine version: 2019.4.20f1 (6dd1c08eedfa)
[Subsystems] Discovering subsystems at path /opt/valheim/server/valheim_server_Data/UnitySubsystems
Forcing GfxDevice: Null
GfxDevice: creating device client; threaded=0
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies
Begin MonoManager ReloadAssembly
- Completed reload, in  0.061 seconds
UnloadTime: 0.993000 ms
02/27/2021 09:48:30: Starting to load scene:start
2021-02-27 09:48:30,275 INFO success: valheim-server entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
HDR Render Texture not supported, disabling HDR on reflection probe. 
lloesche commented 3 years ago

Just an FYI, on our server where we've built a little village with around 8 friends the server constantly consumes 3 GB RSS and 10 GB VSZ.

/usr/local/bin/valheim-server: line 65: 21772 Killed "$valheim_server" -nographics -batchmode -name "$SERVER_NAME" -port $SERVER_PORT -world "$WORLD_NAME" -password "$SERVER_PASS" -public $SERVER_PUBLIC > >(filter stdout) 2> >(filter stderr >&2)

This line means that the server process did not crash it was killed by an external process. The only thing that comes to mind going around doing that usually is the Linux kernel OOM killer.

pjbates commented 3 years ago

I've got a similar issue. Running on ACI as well, Starts and runs fine, can connect and play with no issues.

Whenever I stop / start the container the world is resetting. Same seed, but all build is missing from the world. World files and backups are all persisting in /config just fine.

I can see it loading the world files on start

2021-02-28 06:06:04 [13] (INFO) Downloading/updating/validating Valheim server from Steam adding: worlds/ (stored 0%) adding: worlds/Helios.db (deflated 66%) adding: worlds/Helios.db.old (deflated 66%) adding: worlds/Helios.fwl (deflated 5%) adding: worlds/Helios.fwl.old (deflated 5%) 2021-02-28 06:06:04 [11] (INFO) Removing backups older than 3 days

It's just not persisting anything I've built.

I've seen some references pointing to this been an issue with how the server is shutdown. Could Azure Container Instances be doing something crazy and ignoring the shutdown procedure you've specified in the container? (Assuming that's how its done, docker nub here)

Any logs I can grab that would help? Thanks

lloesche commented 3 years ago

@pjbates if Azure is sending a normal docker stop to the container resulting in a SIGTERM inside of it then valheim will shut down gracefully. If they docker kill (or whatever the equivalent of their containerizer) then the server will be killed and not have a chance to save the world. The servers save interval I believe is every 20 minutes.

lloesche commented 3 years ago

@pjbates what happens if you use admin commands to save the world manually. Are world changes persisted then?

pjbates commented 3 years ago

@pjbates what happens if you use admin commands to save the world manually. Are world changes persisted then?

Good suggestion. Just tried it and the game states persists properly with the manual save command then a container restart.

Sounds like Azure Container Instances is not using docker stop in the expected fashion. I'll see if I can find any documentation from MS that clarifies.

pjbates commented 3 years ago

Sounds like someone else has a similar issue https://stackoverflow.com/questions/66377009/gracefully-stop-azure-container-instance

No answer though.

lloesche commented 3 years ago

Question to the group. How are you running this container in Azure? For debugging of https://github.com/lloesche/valheim-server-docker/issues/120 I just tried to run the container using az container create but I'm getting the same result as @Nirco96 /opt/steamcmd/linux32/steamcmd: cannot execute binary file: Exec format error. I'm unable to run any 32bit binaries on the Azure Linux container host. So storage and restart issues aside, how did you get Azure to even run 32bit steamcmd? Is this an extra option that needs to be provided upon container creation? A quick search did not yield any results for me.

pjbates commented 3 years ago

I used the template from https://github.com/jtucker/valheim-aci Automates the storage and networking setup then pulls your image. Runs fine for me in ACI, just doesn't save properly.

johnposten commented 3 years ago

Mine is running fine in West Central US ACI. I use the following command:

az container create \
     --cpu 2 \
     --memory 4 \
     --resource-group dev \
     --name something \
     --image lloesche/valheim-server \
     --dns-name-label something \
     --ports 2456 2457 2458 \
     --protocol UDP \
     --restart-policy Never \
     --environment-variables SERVER_NAME=something SERVER_PASS=secret WORLD_NAME=something \
     --azure-file-volume-account-name something \
     --azure-file-volume-account-key blahblahblah \
     --azure-file-volume-share-name something \
     --azure-file-volume-mount-path /config

Also when I stop the container it does not seem to be doing a graceful shutdown either. It only saves from a manual save, or the 20min interval.

lloesche commented 3 years ago

Interesting. So that means Azure is using 32bit capable container hosts in the US but 64bit only kernels in Europe? I couldn't find any documentation on this, but through trial and error this seems to be the case. Maybe we need to create a separate image that has the game server already downloaded in it. steamcmd is the only 32 bit program in the container. Valheim server as well as the steam library it's linked against are 64 bit.

lloesche commented 3 years ago

@nodu were you able to figure this out? Are the disconnects still happening? Were you able to test with more RAM (~8 GB) or on a different Cloud provider? If you found a solution would be great if you can share it here so others can learn from your work.

nodu commented 3 years ago

Heya, Yes as soon as I expanded Docker's total available memory/cpu resources the server started running smoothly. Thanks for the help folks!

On Sun, Mar 7, 2021 at 4:13 AM Lukas Lösche notifications@github.com wrote:

@nodu https://github.com/nodu were you able to figure this out? Are the disconnects still happening? Were you able to test with more RAM (~8 GB) or on a different Cloud provider? If you found a solution would be great if you can share it here so others can learn from your work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lloesche/valheim-server-docker/issues/84#issuecomment-792268942, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4NGI5H5QXQA6DVKPAJSPDTCNUWJANCNFSM4YBOG34A .