unitystation / unitystation

The original unitystation
https://unitystation.org
GNU Affero General Public License v3.0
704 stars 648 forks source link

Server Breaking B: 65 #2646

Closed chairbender closed 4 years ago

chairbender commented 4 years ago

Description

We've noticed on the headless servers we have running 24/7, that the server can end up in a sort of broken state where there appears to be lots of bugs and many things don't work properly. This ticket is for tracking this sort of problem, even if it may actually be due to multiple problems.

This probably has one or a few root causes that causes the server to stop working constantly. It might not even be the same symptoms each time. I believe a round restart usually fixes these.

For example, when the server was once in this state I joined as a new player and spawned apparently at hiddenpos and suffocated.

Most likely the root cause will reveal itself in the server logs. So just look at the logs and try to see what went wrong prior to the issues appearing. It may even be reproducible locally if you just leave a local headless running long enough.

chairbender commented 4 years ago

This is an important and complicated one so putting at least 65 on it for now.

Bod9001 commented 4 years ago

Part of the issue is that the update manager when something breaks inside of that it breaks everything Else that needs updating

chairbender commented 4 years ago

Ah, and I'm not sure if we ever added logic to properly de-register and re-register everything from UpdateManager on round restart. That could be part of the issue.

So it might require some combination of restarting round and then waiting

chairbender commented 4 years ago

I would like to try to get a local reproduction of this, I think I'll leave a local headless running today and periodically check in. It will make debugging easier so we don't have to login to server admin portal

chairbender commented 4 years ago

I logged out and back in and got put into a different ghost.

chairbender commented 4 years ago

Getting a lot of lerpbacks as well:

[Movement] Dist 1.754898 > 1:[Move #45, localPos:(24.8, 53.0), worldPos:(25.8, 54.0) NoLerp:False, WorldImpulse:(0.0, 0.0), reset: False, flight: False, follow: False, matrix #0]
Target    :[Move #47, localPos:(23.0, 53.0), worldPos:(24.0, 54.0) NoLerp:False, WorldImpulse:(0.0, 0.0), reset: False, flight: False, follow: False, matrix #0]

(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

[Movement] Elijah Leach: Server pending actions overflow! (More than 10).
Either server lagged or player is attempting speedhack

Perhaps there is an issue with how the movement messages are being processed, I'm not entirely sure it's actually the server lagging.

chairbender commented 4 years ago

EDIT: I will move this to a different ticket as it's seemingly unrelated since it happens after restart

Just logged in after server restart and I got the Character Settings panel and the player HUD overlayed on top of it: image

but I stopped getting lots of lerpbacks / spawning as ghost

Bod9001 commented 4 years ago

Interaction stuff broke, hands get randomly swapped are inverted as Inverted compared to the Server state ,

chairbender commented 4 years ago

We got bad lag after we initially joined after the server had been running a round for a long time with no players. Went away after restart.

chairbender commented 4 years ago

We also saw that atmos froze and even after a round restart it didn't start working again. By "froze" we mean that the contents of tiles weren't being updated and you weren't being sucked out of holes by space wind.

JesterX666 commented 4 years ago

For atmos, that sounds like if the thread was completly dead (not running at all). Perhaps the main thread should try to see if "it's alive" and report error + restart it.

Bod9001 commented 4 years ago

Interestingly when I was testing EU, I think the start method wasn't being called properly because stuff like reagent containers didn't have anything in them and the lockers didn't have anything in them as well

chairbender commented 4 years ago

I am increasingly believing UpdateManager plays a part in this issue. I think anything which is relying on UpdateManager (which is quite a few things) are the sorts of things we see breaking, for example progress actions not starting in #2656 .

I might suggest removing it and instead using Unity's purpose-built alternatives (whatever that might be, maybe just Update loops) or figuring out how to make it so that exceptions in any of its actions don't break the entire game. It is just too damn fragile.

EDIT: Progress actions don't actually use UpdateManager!

chairbender commented 4 years ago

Re: Bod's comment about start method not being properly called...it might be the lifecycle system I added, specifically OnSpawnServer, not being reliably called.

Bod9001 commented 4 years ago

I seem to notice that when the scene complains about not being saved since it hasn't generated unet IDs the same issue happens with the OnSpawnServer

DooblyNoobly commented 4 years ago

it's pretty stable now. Just a few smaller NRE's to take care of:

Exception in OnStartServer:Object reference not set to an instance of an object   at HydroponicsTray.ProduceCrop () [0x00055] in /home/unitystation/UnityProject/Assets/Scripts/Botany/hydroponics tray/HydroponicsTray.cs:524 
  at HydroponicsTray.OnStartServer () [0x00078] in /home/unitystation/UnityProject/Assets/Scripts/Botany/hydroponics tray/HydroponicsTray.cs:62 
  at Mirror.NetworkIdentity.OnStartServer (System.Boolean allowNonZeroNetId) [0x00113] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkIdentity.cs:595 
NullReferenceException: Object reference not set to an instance of an object
  at Meleeable.WillInteract (PositionalHandApply interaction, NetworkSide side) [0x000a1] in /home/unitystation/UnityProject/Assets/Scripts/Weapons/Meleeable.cs:72 
  at InteractionUtils.CheckInteractInternal[T] (IBaseInteractable`1[T] interactable, T interaction, NetworkSide side, System.Boolean& wasClientInteractable) [0x000cf] in /home/unitystation/UnityProject/Assets/Scripts/Input System/InteractionV2/InteractionUtils.cs:89 
  at InteractionUtils.ServerCheckInteract[T] (IBaseInteractable`1[T] interactable, T interaction) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Input System/InteractionV2/InteractionUtils.cs:151 
  at RequestInteractMessage.ProcessInteraction[T] (T interaction, UnityEngine.GameObject processorObj) [0x000a3] in /home/unitystation/UnityProject/Assets/Scripts/Messages/Client/Interaction/RequestInteractMessage.cs:225 
  at RequestInteractMessage+<Process>d__21.MoveNext () [0x00185] in /home/unitystation/UnityProject/Assets/Scripts/Messages/Client/Interaction/RequestInteractMessage.cs:113 
  at UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) [0x00028] in /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/Coroutines.cs:17 
DooblyNoobly commented 4 years ago
NullReferenceException: Object reference not set to an instance of an object
  at MobFlee.CanNPCAccessDoor (DoorController doorController) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/NPC/AI/MobFlee.cs:183 
  at MobFlee+<FindValidWayPoint>d__9.MoveNext () [0x00121] in /home/unitystation/UnityProject/Assets/Scripts/NPC/AI/MobFlee.cs:70 
  at UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) [0x00028] in /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/Coroutines.cs:17 
UnityEngine.MonoBehaviour:StartCoroutineManaged2(IEnumerator)
UnityEngine.MonoBehaviour:StartCoroutine(IEnumerator) (at /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/MonoBehaviour.bindings.cs:91)
MobFlee:TryToFlee() (at /home/unitystation/UnityProject/Assets/Scripts/NPC/AI/MobFlee.cs:47)
MobFlee:FollowCompleted() (at /home/unitystation/UnityProject/Assets/Scripts/NPC/AI/MobFlee.cs:39)
<PerformFollowPath>d__35:MoveNext() (at /home/unitystation/UnityProject/Assets/Scripts/NPC/AI/MobPathFinder.cs:387)
UnityEngine.SetupCoroutine:InvokeMoveNext(IEnumerator, IntPtr) (at /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/Coroutines.cs:17)
NullReferenceException
  at (wrapper managed-to-native) UnityEngine.Component.get_gameObject(UnityEngine.Component)
  at CustomNetTransform.NotifyPlayers () [0x0001d] in /home/unitystation/UnityProject/Assets/Scripts/Transform/CustomNetTransform.cs:533 
  at PushPull.NotifyPlayers () [0x00016] in /home/unitystation/UnityProject/Assets/Scripts/Objects/PushPull.cs:904 
  at PlayerSync.RollbackPosition () [0x00045] in /home/unitystation/UnityProject/Assets/Scripts/Player/PlayerSync.Server.cs:423 
  at PlayerSync.NextStateServer (PlayerState state, PlayerAction action) [0x00104] in /home/unitystation/UnityProject/Assets/Scripts/Player/PlayerSync.Server.cs:498 
  at PlayerSync.TryUpdateServerTarget () [0x0009d] in /home/unitystation/UnityProject/Assets/Scripts/Player/PlayerSync.Server.cs:448 
  at PlayerSync.Synchronize () [0x000c5] in /home/unitystation/UnityProject/Assets/Scripts/Player/PlayerSync.cs:575 
  at PlayerSync.Update () [0x00106] in /home/unitystation/UnityProject/Assets/Scripts/Player/PlayerSync.cs:532 
NullReferenceException: Object reference not set to an instance of an object
  at Meleeable.WillInteract (PositionalHandApply interaction, NetworkSide side) [0x000a1] in /home/unitystation/UnityProject/Assets/Scripts/Weapons/Meleeable.cs:72 
  at InteractionUtils.CheckInteractInternal[T] (IBaseInteractable`1[T] interactable, T interaction, NetworkSide side, System.Boolean& wasClientInteractable) [0x000cf] in /home/unitystation/UnityProject/Assets/Scripts/Input System/InteractionV2/InteractionUtils.cs:89 
  at InteractionUtils.ServerCheckInteract[T] (IBaseInteractable`1[T] interactable, T interaction) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Input System/InteractionV2/InteractionUtils.cs:151 
  at RequestInteractMessage.ProcessInteraction[T] (T interaction, UnityEngine.GameObject processorObj) [0x000a3] in /home/unitystation/UnityProject/Assets/Scripts/Messages/Client/Interaction/RequestInteractMessage.cs:225 
  at RequestInteractMessage+<Process>d__21.MoveNext () [0x00185] in /home/unitystation/UnityProject/Assets/Scripts/Messages/Client/Interaction/RequestInteractMessage.cs:113 
  at UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) [0x00028] in /home/builduser/buildslave/unity/build/Runtime/Export/Scripting/Coroutines.cs:17 
DooblyNoobly commented 4 years ago

Looks like security records and cargo console do not reset on round restarts

DooblyNoobly commented 4 years ago

^^^ All those issues above should be fixed.

New server nres:

NullReferenceException
  at (wrapper managed-to-native) UnityEngine.Component.get_gameObject(UnityEngine.Component)
  at ElectricalOIinheritance.GameObject () [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Inheritance/ElectricalIOInheritance.cs:83 
  at ElectricalSynchronisation.CircuitSearchLoop (ElectricalOIinheritance Thiswire) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Electrical processes/ElectricalSynchronisation.cs:519 
  at PowerSupplyFunction.PowerUpdateStructureChangeReact (ModuleSupplyingDevice Supply) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/FunctionsAndClasses/PowerSupplyFunction.cs:22 
  at ModuleSupplyingDevice.PowerUpdateStructureChangeReact () [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/NodeModules/ModuleSupplyingDevice.cs:56 
  at ElectricalNodeControl.UpPowerUpdateStructureChangeReact () [0x00032] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/NodeModules/ElectricalNodeControl.cs:254 
  at ElectricalNodeControl.PowerUpdateStructureChangeReact () [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/NodeModules/ElectricalNodeControl.cs:96 
  at ElectricalSynchronisation.PowerUpdateStructureChangeReact () [0x0003d] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Electrical processes/ElectricalSynchronisation.cs:312 
  at ElectricalSynchronisation.DoTick () [0x0002b] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Electrical processes/ElectricalSynchronisation.cs:231 
  at ElectricalSynchronisation.DoUpdate () [0x000c4] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Electrical processes/ElectricalSynchronisation.cs:211 
  at ElectricalManager.Update () [0x0001b] in /home/unitystation/UnityProject/Assets/Scripts/Electricity/Electrical processes/ElectricalManager.cs:26 
Macoron commented 4 years ago

Got this one on a late join to headless:


UnityEngine.DebugLogHandler:Internal_Log(LogType, LogOption, String, Object)
UnityEngine.DebugLogHandler:LogFormat(LogType, Object, String, Object[])
UnityEngine.Logger:LogFormat(LogType, String, Object[])
UnityEngine.Debug:LogWarningFormat(String, Object[])
Logger:TryLog(String, LogLevel, Category, Object[]) (at /home/unitystation/UnityProject/Assets/Scripts/Debug/Logger.cs:146)
Logger:LogWarningFormat(String, Category, Object[]) (at /home/unitystation/UnityProject/Assets/Scripts/Debug/Logger.cs:85)
<WaitFor>d__4:MoveNext() (at /home/unitystation/UnityProject/Assets/Scripts/Messages/GameMessageBase.cs:32)
UnityEngine.SetupCoroutine:InvokeMoveNext(IEnumerator, IntPtr) (at C:\buildslave\unity\build\Runtime\Export\Scripting\Coroutines.cs:17)

(Filename: /home/unitystation/UnityProject/Assets/Scripts/Debug/Logger.cs Line: 146)

NullReferenceException: Object reference not set to an instance of an object
  at HealthConsciousMessage+<Process>d__3.MoveNext () [0x0004c] in /home/unitystation/UnityProject/Assets/Scripts/Messages/Server/HealthMessages/HealthConsciousMessage.cs:18 
  at UnityEngine.SetupCoroutine.InvokeMoveNext (System.Collections.IEnumerator enumerator, System.IntPtr returnValueAddress) [0x00028] in C:\buildslave\unity\build\Runtime\Export\Scripting\Coroutines.cs:17 

(Filename: /home/unitystation/UnityProject/Assets/Scripts/Messages/Server/HealthMessages/HealthConsciousMessage.cs Line: 18)```
Macoron commented 4 years ago

Also this one after grenade explosion:


KeyNotFoundException: The given key was not present in the dictionary.
  at System.Collections.Generic.Dictionary`2[TKey,TValue].get_Item (TKey key) [0x0001e] in <567df3e0919241ba98db88bec4c6696f>:0 
  at TileDictionary.get_Item (TileType type, System.String name) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Tilemaps/TileManager.cs:31 
  at TileManager.GetTile (TileType tileType, System.String key) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Tilemaps/TileManager.cs:62 
  at TileChangeManager.InternalUpdateTile (UnityEngine.Vector3 position, TileType tileType, System.String tileName) [0x00001] in /home/unitystation/UnityProject/Assets/Scripts/Tilemaps/Behaviours/Meta/TileChangeManager.cs:180 
  at TileChangeManager.RpcUpdateTile (UnityEngine.Vector3 position, TileType tileType, System.String tileName) [0x0000e] in /home/unitystation/UnityProject/Assets/Scripts/Tilemaps/Behaviours/Meta/TileChangeManager.cs:175 
  at TileChangeManager.InvokeRpcRpcUpdateTile (Mirror.NetworkBehaviour obj, Mirror.NetworkReader reader) [0x0002e] in <45900f365ec1419093d458fa2d092c06>:0 
  at Mirror.NetworkBehaviour.InvokeHandlerDelegate (System.Int32 cmdHash, Mirror.MirrorInvokeType invokeType, Mirror.NetworkReader reader) [0x00020] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkBehaviour.cs:436 
  at Mirror.NetworkIdentity.HandleRemoteCall (System.Int32 componentIndex, System.Int32 functionHash, Mirror.MirrorInvokeType invokeType, Mirror.NetworkReader reader) [0x00083] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkIdentity.cs:946 
  at Mirror.NetworkIdentity.HandleRPC (System.Int32 componentIndex, System.Int32 rpcHash, Mirror.NetworkReader reader) [0x00001] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkIdentity.cs:975 
  at Mirror.ClientScene.OnRPCMessage (Mirror.NetworkConnection _, Mirror.RpcMessage msg) [0x0005e] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/ClientScene.cs:690 
  at (wrapper delegate-invoke) System.Action`2[Mirror.NetworkConnection,Mirror.RpcMessage].invoke_void_T1_T2(Mirror.NetworkConnection,Mirror.RpcMessage)
  at Mirror.MessagePacker+<>c__DisplayClass5_0`1[T].<MessageHandler>b__0 (Mirror.NetworkMessage networkMessage) [0x0005d] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/MessagePacker.cs:133 
  at Mirror.NetworkConnection.InvokeHandler (System.Int32 msgType, Mirror.NetworkReader reader) [0x00036] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkConnection.cs:293 
  at Mirror.NetworkConnection.TransportReceive (System.ArraySegment`1[T] buffer) [0x0007f] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkConnection.cs:334 
  at Mirror.NetworkClient.OnDataReceived (System.ArraySegment`1[T] data) [0x0000e] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/NetworkClient.cs:165 
  at (wrapper delegate-invoke) UnityEngine.Events.UnityAction`1[System.ArraySegment`1[System.Byte]].invoke_void_T0(System.ArraySegment`1<byte>)
  at UnityEngine.Events.InvokableCall`1[T1].Invoke (T1 args0) [0x00018] in C:\buildslave\unity\build\Runtime\Export\UnityEvent\UnityEvent.cs:207 
  at UnityEngine.Events.UnityEvent`1[T0].Invoke (T0 arg0) [0x00025] in C:\buildslave\unity\build\Runtime\Export\UnityEvent\UnityEvent\UnityEvent_1.cs:58 
  at Mirror.TelepathyTransport.ProcessClientMessage () [0x0003c] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/Transport/TelepathyTransport.cs:65 
  at Mirror.TelepathyTransport.LateUpdate () [0x00005] in /home/unitystation/UnityProject/Assets/Mirror/Runtime/Transport/TelepathyTransport.cs:92 

(Filename: <567df3e0919241ba98db88bec4c6696f> Line: 0)```
DooblyNoobly commented 4 years ago

multiple parties worked on this ticket and the majority agreed to return bounty to the pool