liyunfan1223 / mod-playerbots

AzerothCore Playerbots Module
https://discord.gg/NQm5QShwf9
GNU Affero General Public License v3.0
242 stars 138 forks source link

becomes increasingly sluggish. After one hour, all players are completely frozen #537

Open zbhcn opened 1 week ago

zbhcn commented 1 week ago

Version on September 19. However, after the server runs for a certain period of time, approximately 15 minutes later, the response time of .server info changes from 10 milliseconds to more than 50 milliseconds and becomes increasingly sluggish. After one hour, all players are completely frozen,but not crash

zbhcn commented 1 week ago

AzerothCore rev. unknown 1970-01-01 00:00:00 +0000 (Archived branch) (Win64, Release, Static) Connected players: 0. Characters in world: 260. Connection peak: 1. 服务器运行时间: 47 minute(s) 22 second(s) Update time diff: 76ms. Last 500 diffs summary: |- Mean: 162ms |- Median: 80ms |- Percentiles (95, 99, max): 391ms, 1303ms, 2541ms

zbhcn commented 1 week ago

AzerothCore rev. unknown 1970-01-01 00:00:00 +0000 (Archived branch) (Win64, Release, Static) Connected players: 0. Characters in world: 260. Connection peak: 1. 服务器运行时间: 48 minute(s) 19 second(s) Update time diff: 96ms. Last 500 diffs summary: |- Mean: 187ms |- Median: 79ms |- Percentiles (95, 99, max): 423ms, 1355ms, 5002ms

zbhcn commented 1 week ago

服务器运行时间: 53 minute(s) 45 second(s) Update time diff: 1322ms. Last 500 diffs summary: |- Mean: 258ms |- Median: 119ms |- Percentiles (95, 99, max): 525ms, 1837ms, 3447ms

zbhcn commented 1 week ago

da427b2ec79a2f35a520c4b99682b312

zbhcn commented 1 week ago

I haven't made any changes to the source code and haven't installed any other mods yet!!!

manstfu commented 1 week ago

i got the same problem today, server frozen, no crash and no player disconnects, it stay frozen till manual world restart

noisiver commented 1 week ago

da427b2ec79a2f35a520c4b99682b312

It says Unknown 1970-01-01 00:00:00 +0000 (Archived Branch) so there's no way of knowing what you actually have and there's no guarantee that it's not something on your end causing it.

Edit: It's also compiled in Release, which you shouldn't do. It should always be compiled in RelWithDebInfo and on Windows you should never remove the .pdb files.

manstfu commented 1 week ago

I had it in rev. 9780dbab7e11 world server frozen, auth stuck on 'Connected'

noisiver commented 1 week ago

image Well, I'm not suffering from it so I don't know what to say. I compiled this about 12 hours ago so I know it's up-to-date.

zbhcn commented 1 week ago

my code SHA: ef4064cfc1bf19a8732099d94a532d99015bc1d1

Azraxiel commented 1 week ago

grafik well, i updated yesterday and two hours ago. (750 Bots) had some spikes around 3000-4000ms, purged the bots, restart everything seems still fine

zbhcn commented 1 week ago

image

manstfu commented 1 week ago

12 hours of server freeze, no crash log, no gdb errors, thats it image

zbhcn commented 1 week ago

12 hours of server freeze, no crash log, no gdb errors, thats it image

me too,I look forward to the author addressing this issue.

serverwar commented 1 week ago

image I use the latest version, what I did when the problem occurred, I checked the server's save time, after changing the map I increased the time, another question about freezing I found something about it.

serverwar commented 1 week ago

image I use the latest version, what I did when the problem occurred, I checked the server's save time, after changing the map I increased the time, another question about freezing I found something about it.

https://github.com/azerothcore/azerothcore-wotlk/issues/18543 - This is the main error of freezer without logs

serverwar commented 1 week ago

AiPlayerbot.RandomBotUpdateInterval = 60 AiPlayerbot.RandomBotCountChangeMinInterval = 3600 AiPlayerbot.RandomBotCountChangeMaxInterval = 14400 AiPlayerbot.MinRandomBotInWorldTime = 1800 AiPlayerbot.MaxRandomBotInWorldTime = 21600 AiPlayerbot.MinRandomBotRandomizeTime = 86400 AiPlayerbot.MaxRandomBotRandomizeTime = 604800 AiPlayerbot.RandomBotsPerInterval = 100 AiPlayerbot.MinRandomBotReviveTime = 30 AiPlayerbot.MaxRandomBotReviveTime = 120 AiPlayerbot.MinRandomBotTeleportInterval = 1800 AiPlayerbot.MaxRandomBotTeleportInterval = 7200 AiPlayerbot.RandomBotInWorldWithRotationDisabled = 2592000

Random bot count

AiPlayerbot.MinRandomBots = 70 AiPlayerbot.MaxRandomBots = 100

Random bot account

AiPlayerbot.RandomBotAccountCount = 200

serverwar commented 1 week ago

image I started monitoring everything and making small changes, some corrections and I was monitoring, server today has zero crashes and zero lag, I hope I helped you.

zbhcn commented 1 week ago

图像 我开始监控一切并做出小的改变、一些修正和监控,今天服务器没有崩溃,也没有延迟,我希望我能帮到你。

thank you , i will try

EricksOliveira commented 1 week ago

AiPlayerbot.RandomBotUpdateInterval = 60 AiPlayerbot.RandomBotCountChangeMinInterval = 3600 AiPlayerbot.RandomBotCountChangeMaxInterval = 14400 AiPlayerbot.MinRandomBotInWorldTime = 1800 AiPlayerbot.MaxRandomBotInWorldTime = 21600 AiPlayerbot.MinRandomBotRandomizeTime = 86400 AiPlayerbot.MaxRandomBotRandomizeTime = 604800 AiPlayerbot.RandomBotsPerInterval = 100 AiPlayerbot.MinRandomBotReviveTime = 30 AiPlayerbot.MaxRandomBotReviveTime = 120 AiPlayerbot.MinRandomBotTeleportInterval = 1800 AiPlayerbot.MaxRandomBotTeleportInterval = 7200 AiPlayerbot.RandomBotInWorldWithRotationDisabled = 2592000

Random bot count

AiPlayerbot.MinRandomBots = 70 AiPlayerbot.MaxRandomBots = 100

Random bot account

AiPlayerbot.RandomBotAccountCount = 200

I'm using it and I can really feel the difference! Thank you very much for the Report.

zbhcn commented 1 week ago

I found the reason because the robot uses jewelry, just disable this action, modify the Racialsstrategy.cpp China Bank 37 to change to ://new NextAction("use trinket", ACTION_NORMAL + 4)

Dreathean commented 1 week ago

Er, so that will remove their ability to use trinkets? Seems like it needs to be fixed rather than that capability removed entirely, though I guess until it gets fixed that helps people keep playing for now.

The other fix with adjusting the settings/intervals seemed to have worked for the others, no?

zbhcn commented 1 week ago

Er, so that will remove their ability to use trinkets? Seems like it needs to be fixed rather than that capability removed entirely, though I guess until it gets fixed that helps people keep playing for now.

The other fix with adjusting the settings/intervals seemed to have worked for the others, no?

Yes, but currently I don't have a better way. This will cause all bots to lose this function.

serverwar commented 1 week ago

image 1 day online without crashes

serverwar commented 1 week ago

image I use the latest version, what I did when the problem occurred, I checked the server's save time, after changing the map I increased the time, another question about freezing I found something about it.

azerothcore/azerothcore-wotlk#18543 - This is the main error of freezer without logs

image

noisiver commented 1 week ago

Deadlocks doesn't freeze the server, and those are a result of increasing SQL threads for extremely minimal gain (pretty much none).

Trus3683 commented 1 week ago

No issues until updating, experiencing similar issues.

serverwar commented 1 week ago

No issues until updating, experiencing similar issues.

@liyunfan1223 Friend, if you can take a look at this high load causing the server to freeze, it happened to me but I solved it by reducing the load in other places.

hermensbas commented 1 week ago

Whats your memory consumption and cpu when this behaviour happens?

zbhcn commented 1 week ago

Whats your memory consumption and cpu when this behaviour happens?

it's not high level then

EricksOliveira commented 1 week ago

No issues until updating, experiencing similar issues.

@liyunfan1223 Friend, if you can take a look at this high load causing the server to freeze, it happened to me but I solved it by reducing the load in other places.

After implementing your report further up. I managed to greatly reduce the crash. It happened again after starting AddClass and I received a complaint about Delay in Raid. 10N I believe the delay started after excessive use of the attack command. Your suggestion ( https://github.com/liyunfan1223/mod-playerbots/issues/547 ) will partially resolve this abuse of commands!

serverwar commented 1 week ago

No issues until updating, experiencing similar issues.

@liyunfan1223 Friend, if you can take a look at this high load causing the server to freeze, it happened to me but I solved it by reducing the load in other places.

After implementing your report further up. I managed to greatly reduce the crash. It happened again after starting AddClass and I received a complaint about Delay in Raid. 10N I believe the delay started after excessive use of the attack command. Your suggestion ( #547 ) will partially resolve this abuse of commands!

image It is very rare for it to freeze, sometimes it happens so I am waiting for a solution.

Trus3683 commented 1 week ago

Whats your memory consumption and cpu when this behaviour happens?

On my end, looks like some sort of memory leak. I monitored memory usage - looks like my worldserver boots at about 6GB utilization and gradually increases until it just stops working around 9GB and frees up all the memory. As it does this, update time diff gradually increases as well - server eventually hits 150+ and becomes unplayable.

noisiver commented 1 week ago

I don't know the specific reason but it will continue to eat RAM until there's none left. My server was up for about 15 hours and hit 16GB at that point but at about 12 hours it was already at 13GB. This isn't something that happened before, and is probably something recent. Not long ago it was still at 8-9GB after several days of uptime.

liyunfan1223 commented 6 days ago

It might be related to https://github.com/liyunfan1223/mod-playerbots/pull/514. RandomBotUpdateInterval = 1 may cause RandomPlayerbotMgr to scan all random bots and to execute ProcessBot() too intensively.

If you have this issue, try using the old configuration to see if the problem still occurs:

AiPlayerbot.RandomBotUpdateInterval = 20
AiPlayerbot.RandomBotsPerInterval = 500

or just increase RandomBotUpdateInterval

AiPlayerbot.RandomBotUpdateInterval = 5
AiPlayerbot.RandomBotsPerInterval = 60
hermensbas commented 6 days ago

Kinda had that feeling it was am resource issue based on behaviours described.

Not entirely sure ive been running those settings for quite a while without any problems. But that being said i am not entirely convinced whether update creates an issue or not. Never noticed till recently. I will try to replicate it with the latest code builds.

It might the case as there is a mem leak introduced somewhere and the update simply speeds up the process? And lowering the value might slow down the OOM but does not mitigate the issue itself?

liyunfan1223 commented 6 days ago

Agree. The overhead of update should be relatively small, even once per second should not cause the server to get stuck.

I hope to check the running status of the program when frozen occurs, but I cannot reproduce it so far.

hermensbas commented 6 days ago

I am running ah-bot next to playerbot, 5700x 8 cores assigned to VM with 16GB

Playerbots.conf

AiPlayerbot.MinRandomBots = 3480
AiPlayerbot.MaxRandomBots = 3500
AiPlayerbot.RandomBotMinLevel = 1
AiPlayerbot.RandomBotMaxLevel = 70
AiPlayerbot.RandomBotMaxLevelChance = 0.01
AiPlayerbot.RandomBotFixedLevel = 0
AiPlayerbot.DisableRandomLevels = 0
AiPlayerbot.RandombotStartingLevel = 5
AiPlayerbot.SyncLevelWithPlayers = 0
AiPlayerbot.AutoTeleportForLevel = 1
AiPlayerbot.RandomBotMaps = 0,1,530,571
AiPlayerbot.DisableDeathKnightLogin = 0
AiPlayerbot.BotActiveAlone = 100

AiPlayerbot.EnableRotation = 1
AiPlayerbot.RotationPoolSize = 4000
AiPlayerbot.RandomBotAccountCount = 500

After 30m, i am alrdy heading towards 12GB of memory usage, cant recoil having that amount of usage before. I will keep it running for abit more, then try with static bot amount and disabled rotations. And then try without ah-bot.

Still nothing verified regarding this issue

EricksOliveira commented 6 days ago

AiPlayerbot.RandomBotUpdateInterval = 60 AiPlayerbot.RandomBotCountChangeMinInterval = 3600 AiPlayerbot.RandomBotCountChangeMaxInterval = 14400 AiPlayerbot.MinRandomBotInWorldTime = 1800 AiPlayerbot.MaxRandomBotInWorldTime = 21600 AiPlayerbot.MinRandomBotRandomizeTime = 86400 AiPlayerbot.MaxRandomBotRandomizeTime = 604800 AiPlayerbot.RandomBotsPerInterval = 100 AiPlayerbot.MinRandomBotReviveTime = 30 AiPlayerbot.MaxRandomBotReviveTime = 120 AiPlayerbot.MinRandomBotTeleportInterval = 1800 AiPlayerbot.MaxRandomBotTeleportInterval = 7200 AiPlayerbot.RandomBotInWorldWithRotationDisabled = 2592000

Contagem aleatória de bots

AiPlayerbot.MinRandomBots = 70 AiPlayerbot.MaxRandomBots = 100

Conta de bot aleatória

AiPlayerbot.RandomBotContagem de contas = 200

I did some tests with the new configuration. After 1 day and 10 hours the worldserver Online. There was a lot of delay and it was necessary to restart. It started with 3.5/6.0 GB of Memory and after 1 Day and 10 Hours the memory was already being consumed at 5.9/6.0 GB.

hermensbas commented 6 days ago

Executing the '.playerbot rndbot init' command doesnt show any instability which invokes the initialize process of bots where most of numbers are being used. Not even when forcing the update all at once, nor i see any movement in the memory footprint while doing those actions.

There are a lot variables in the config which impacts the memory footprint e.g.

PreloadAllNonInstancedMapGrids = 0
SetAllCreaturesWithWaypointMovementActive = 0
DontCacheRandomMovementPaths = 1

The MapGrids for example by default are initialized by lazyloading which basically means the mapGrid will only be loaded into memory when there is reason too. The activity of bots are e.g. being spread over the maps while cause more sections being loaded into the memory. Nevertheless i see memory growing bit by bit, i cant rlly relate the behaviour to interval timings other then maybe the speed in which the mem footprint is growing.

ps: when the memory is maxed out the server doesnt necessary crash, it simply might start acting strange like some spells might stop casting, disconnect and not being able to connect again, very high server latency, or simply the server crashes etc.

I cant quite explain nor i really have the skillset to profile the runetime against the memory usage :(

Trus3683 commented 5 days ago

It might be related to #514. RandomBotUpdateInterval = 1 may cause RandomPlayerbotMgr to scan all random bots and to execute ProcessBot() too intensively.

If you have this issue, try using the old configuration to see if the problem still occurs:

AiPlayerbot.RandomBotUpdateInterval = 20
AiPlayerbot.RandomBotsPerInterval = 500

or just increase RandomBotUpdateInterval

AiPlayerbot.RandomBotUpdateInterval = 5
AiPlayerbot.RandomBotsPerInterval = 60

Increased RandomBotInterval from 1 to 5. Appeared to work at first, server still hung after about 2 hours. This time, not even max utilization. It was only pulling about 6GB and hung.

Zaedon33 commented 5 days ago

I've noticed the same issue. In addition, there is input lag that appears when in group with any bots while doing trivial quest, grinding or even disenchanting. I have tried disabling random bots, and resetting random bots to no avail.

I don't think this is new to the latest updates though. I noticed it after updating the core and mod about month or two ago.

I have experienced this on both Linux(mint) and windows 10. I am starting a clean Linux server to ensure nothing else may be causing the issues.

noisiver commented 5 days ago

If by input lag you mean that a spell isn't cast when you press it but has a delay to it that's because the server is lagging. The same goes for all actions that are processed by the server. Look at the update diff and it should show a high value.

EricksOliveira commented 5 days ago

4727975d-d42f-410a-851a-d41f7bcb9124 My diff. 70 Bots

Dreathean commented 5 days ago

That's really high, is it that high after a restart too? I see this is after over a day. Oh I see from previous comments that you have 6 GB RAM? You really need 16 GB to run azerothcore without RAM-related issues, with or without playerbots.

EricksOliveira commented 5 days ago

Even if the requirement is 16GB. I didn't run this Delay a while ago, I used 200 Bots without Delay. But now with 50 Bots it has a Delay after a certain time online.

Zaedon33 commented 4 days ago

If by input lag you mean that a spell isn't cast when you press it but has a delay to it that's because the server is lagging. The same goes for all actions that are processed by the server. Look at the update diff and it should show a high value.

You are correct! My update times are almost a mirror image of yours. This started a while ago. I use to run 1500 bots without a problem. Now, even with random bots disabled I have this lag. It does appear to be minimal so long as I don't bring my alt playerbots online.

noisiver commented 4 days ago

That's very weird. I have no issues running the bots, although I only have max bot count set at 500 since I don't need more than that. I'll try bumping it up to see what happens.

noisiver commented 4 days ago

image I'll leave this running for a little bit but at least on startup it's very stable even with this many bots. The update diff is increased because I'm actually over the limit on what this CPU can handle and I can see the cores worldserver uses are capped or near capped.

hermensbas commented 4 days ago
AiPlayerbot.MinRandomBots = 1900
AiPlayerbot.MaxRandomBots = 2000
AiPlayerbot.RandomBotMinLevel = 1
AiPlayerbot.RandomBotMaxLevel = 70
AiPlayerbot.RandomBotFixedLevel = 0
AiPlayerbot.SyncLevelWithPlayers = 0
AiPlayerbot.AutoTeleportForLevel = 1
AiPlayerbot.RandomBotMaps = 0,1,530,571
AiPlayerbot.BotActiveAlone = 100

AiPlayerbot.EnableRotation = 1
AiPlayerbot.RotationPoolSize = 3000
AiPlayerbot.RandomBotAccountCount = 500

2/3 hours and the worldserver is OOM with 16GB. With the latest code at this time.