pvpgn / pvpgn-server

Next generation of PvPGN server
https://pvpgn.pro
GNU General Public License v2.0
553 stars 156 forks source link

Memory leak? #419

Closed StevenFredette closed 2 years ago

StevenFredette commented 4 years ago

So not sure if i'm going crazy or pvpgn is using a lot more cpu over time.

I edited the LUA script. handle_game.lua

function handle_game_create(game) channel_send_message("War2BNE", tostring(game.mapname) .. " hosted by " .. tostring(game.owner), message_type_info) end

Works great to notify channel of game hosted but after 12 hours or so the cpu sadly went from 3% to 95% had to terminate the process and reopen to make server stable again.

I cannot use the latest development build with email because it has my cpu at 90% default.

RElesgoe commented 4 years ago

Try the latest commit (c2167e18a6fc26c40220f921b51698dc7d104db8)

StevenFredette commented 4 years ago

I'm still learning github. So I would need to recompile do you have a compiled plain link?

RElesgoe commented 4 years ago

You can either compile or download from https://pvpgn.pro/download.html

StevenFredette commented 4 years ago

I am now on the latest dev version.

The memory leak might have been fixed (Prior Updates). I need about 12-24 hours to tell.

This definitely runs a lot better on the CPU. Thank YOU!

If the memory leak on the LUA script is gone. I will close tomorrow. Thanks again!

StevenFredette commented 4 years ago

Sadly,

The leaks are still happening with the most current updated development.

image

About 8 hours idle with 12 bots, the CPU went from 3% to 90%. If there is something I can do please let me know. The only difference from default should be the Lua script I mentioned above.

EDIT: hmm... not sure if Lua was causing this now. I removed all 12 bot's on the same IP. Be back in 12 hours with an update.

StevenFredette commented 4 years ago

The CPU stayed at 0 near most of the night. The RAM only increased by .5 MB. (The leak did not occur while the bots were offline.)

The name & version of the bot. image

Bot configuration:

  1. 14 accounts running (pvpgn handles multiple connections per IP.)
  2. Spoof ping to 1ms
  3. Same Channel
  4. W2BN Client

I'm not sure if it's the packets going back and forth between the server or server code for concurrent connections is leaking. Hopefully, this update will help track down whatever is causing that CPU to rocket overtime when those bots connect.

I had about 15 players for 5-6 hours play many games with no issues last night as well. These players are from all over the world.

HarpyWar commented 4 years ago

Try run bots on different machine than pvpgn. Probably undefined udp packets cause this behavior because of the same udp port. Set full logging loglevels = fatal,error,warn,info,debug,trace and look which actions from there may cause cpu load.

StevenFredette commented 4 years ago

Try run bots on different machine than pvpgn. Probably undefined udp packets cause this behavior because of the same udp port. Set full logging loglevels = fatal,error,warn,info,debug,trace and look which actions from there may cause cpu load.

Yes, the bot's (my desktop), pvpgn (my VPS). I enabled more verbose debugging and the CPU is already at 5 CPU% for pvpgn. There is not much coming in the debug console.

There is a bad echo timing when they connect.

Aug 30 09:29:30 [info ] pvpgn::bnetd::conn_destroy: [628] closed bnet connection Aug 30 09:29:30 [debug] pvpgn::bnetd::ipbanlist_check: lastcheck: 1598779563, now: 1598779770, now-lc: 207. Aug 30 09:29:30 [debug] pvpgn::bnetd::ipbanlist_check: checking 50.90.128.16 Aug 30 09:29:31 [debug] pvpgn::bnetd::ipbanlist_check: address 50.90.128.16 does not match exact 76.119.98.17 Aug 30 09:29:31 [debug] pvpgn::bnetd::ipbanlist_check: address 50.90.128.16 does not match exact 76.119.98.17 Aug 30 09:29:31 [info ] pvpgn::bnetd::sd_accept: [628] accepted connection from 50.90.128.16:61078 on 0.0.0.0:6112 Aug 30 09:29:31 [debug] pvpgn::bnetd::conn_create: [628][616] sessionkey=0x1598800988 sessionnum=0x00000014 Aug 30 09:29:31 [debug] pvpgn::bnetd::sd_accept: [628] client connected to a bnet listening address Aug 30 09:29:31 [info ] pvpgn::bnetd::handle_init_packet: [628] client initiated bnet connection Aug 30 09:29:31 [trace] pvpgn::bnetd::conn_shutdown: [628] connection already closed Aug 30 09:29:31 [debug] pvpgn::bnetd::conn_set_class: added latency check timer Aug 30 09:29:31 [debug] pvpgn::bnetd::_client_auth_info: [628] AUTH_INFO packet { protocol=0x0, platform=IX86, product=W2BN, versionid=0x4f, language=SUne, localip=0x1dca8c0, tzbias=012c, locale=1033, language=1033, country=USA.United States } Aug 30 09:29:31 [warn ] pvpgn::bnetd::conn_set_gamelang: got UNKNOWN gamelang Aug 30 09:29:31 [debug] pvpgn::bnetd::_client_auth_info: [628] selected "ver-IX86-1.mpq" "A=3845581634 B=880823580 C=1363937103 4 A=A-S B=B-C C=C-A A=A-B" Aug 30 09:29:31 [warn ] pvpgn::bnetd::_client_echoreply: [628] bad timing in echo reply: now=480442 then=0 Aug 30 09:29:31 [info ] pvpgn::bnetd::_client_authreq109: [628] CLIENT_AUTHREQ_109 ticks=0x5dc056da, verstr=2.0.2.0 exeinfo="WarCraft II BNE.exe 08/27/20 03:22:17 712704" versionid=0x0000004f gameversion=0x02000200 checksum=0xb52bad87 Aug 30 09:29:31 [info ] pvpgn::bnetd::_client_authreq109: [628] client matches versiontag "W2BN_202A" Aug 30 09:29:31 [info ] pvpgn::bnetd::_client_authreq109: [628] no upgrade is available Aug 30 09:29:31 [info ] pvpgn::bnetd::_client_loginreq2: [628] "Admin" logged in (correct password) Aug 30 09:29:32 [error] pvpgn::bnetd::handle_bnet_packet: [628] unknown (logged in) bnet packet type 0x14ff, len 8 Aug 30 09:29:32 [error] pvpgn::bnetd::handle_bnet_packet: [628] unknown (logged in) bnet packet type 0x2dff, len 4 Aug 30 09:29:32 [info ] pvpgn::bnetd::conn_set_channel: [628] joined channel "The Void" Aug 30 09:29:32 [debug] pvpgn::bnetd::class_topic::class_topiclist::get: returning nullptr Aug 30 09:29:34 [info ] pvpgn::bnetd::conn_set_channel: [628] joined channel "War2USA"

I do have these which seem common with or without a leak. <-- could be wrong Aug 30 09:22:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=350854750 Aug 30 09:23:00 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1572973296 Aug 30 09:24:02 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=350944747 Aug 30 09:24:30 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1573063296 Aug 30 09:25:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=351034744 Aug 30 09:26:00 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1573153296

EDIT: The server is near 10% already. The UDP packets are the only thing coming in the logs. Aug 30 09:36:30 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1573783296 Aug 30 09:37:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=351754751 Aug 30 09:38:00 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1573873296 Aug 30 09:39:02 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=351844748 Aug 30 09:39:30 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1573963296 Aug 30 09:40:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=351934745 Aug 30 09:41:00 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1574053296 Aug 30 09:42:02 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=352024742

I closed the bot the server instantly went to 0% CPU and I'm still getting those UDP packets. Remember the server and bot are from different networks and hosts. Aug 30 09:42:46 [info ] pvpgn::bnetd::conn_destroy: [628] "Admin" logged out Aug 30 09:42:46 [info ] pvpgn::bnetd::conn_destroy: [628] closed bnet connection Aug 30 09:42:47 [debug] pvpgn::bnetd::sd_tcpinput: [632] read returned -1 (closing connection) Aug 30 09:42:47 [info ] pvpgn::bnetd::conn_destroy: [632] "PG-13" logged out Aug 30 09:42:47 [info ] pvpgn::bnetd::conn_destroy: [632] closed bnet connection Aug 30 09:42:48 [debug] pvpgn::bnetd::sd_tcpinput: [636] read returned -1 (closing connection) Aug 30 09:42:48 [info ] pvpgn::bnetd::conn_destroy: [636] "Owned" logged out Aug 30 09:42:48 [info ] pvpgn::bnetd::conn_destroy: [636] closed bnet connection Aug 30 09:42:49 [debug] pvpgn::bnetd::sd_tcpinput: [640] read returned -1 (closing connection) Aug 30 09:42:49 [info ] pvpgn::bnetd::conn_destroy: [640] "Love" logged out Aug 30 09:42:49 [info ] pvpgn::bnetd::conn_destroy: [640] closed bnet connection Aug 30 09:42:50 [debug] pvpgn::bnetd::sd_tcpinput: [644] read returned -1 (closing connection) Aug 30 09:42:50 [info ] pvpgn::bnetd::conn_destroy: [644] "Win" logged out Aug 30 09:42:50 [info ] pvpgn::bnetd::conn_destroy: [644] closed bnet connection Aug 30 09:42:51 [debug] pvpgn::bnetd::sd_tcpinput: [648] read returned -1 (closing connection) Aug 30 09:42:51 [info ] pvpgn::bnetd::conn_destroy: [648] "Bot" logged out Aug 30 09:42:51 [info ] pvpgn::bnetd::conn_destroy: [648] closed bnet connection Aug 30 09:42:52 [debug] pvpgn::bnetd::sd_tcpinput: [652] read returned -1 (closing connection) Aug 30 09:42:52 [info ] pvpgn::bnetd::conn_destroy: [652] "Mod" logged out Aug 30 09:42:52 [info ] pvpgn::bnetd::conn_destroy: [652] closed bnet connection Aug 30 09:42:53 [debug] pvpgn::bnetd::sd_tcpinput: [656] read returned -1 (closing connection) Aug 30 09:42:53 [info ] pvpgn::bnetd::conn_destroy: [656] "YouTube" logged out Aug 30 09:42:53 [info ] pvpgn::bnetd::conn_destroy: [656] closed bnet connection Aug 30 09:42:54 [debug] pvpgn::bnetd::sd_tcpinput: [660] read returned -1 (closing connection) Aug 30 09:42:54 [info ] pvpgn::bnetd::conn_destroy: [660] "Twitch" logged out Aug 30 09:42:54 [info ] pvpgn::bnetd::conn_destroy: [660] closed bnet connection Aug 30 09:42:56 [debug] pvpgn::bnetd::sd_tcpinput: [664] read returned -1 (closing connection) Aug 30 09:42:56 [info ] pvpgn::bnetd::conn_destroy: [664] "Discord" logged out Aug 30 09:42:56 [info ] pvpgn::bnetd::conn_destroy: [664] closed bnet connection Aug 30 09:42:57 [debug] pvpgn::bnetd::sd_tcpinput: [668] read returned -1 (closing connection) Aug 30 09:42:57 [info ] pvpgn::bnetd::conn_destroy: [668] "Insta-Win" logged out Aug 30 09:42:57 [info ] pvpgn::bnetd::conn_destroy: [668] closed bnet connection Aug 30 09:42:59 [debug] pvpgn::bnetd::sd_tcpinput: [676] read returned -1 (closing connection) Aug 30 09:42:59 [info ] pvpgn::bnetd::conn_destroy: [676] "Thomas" logged out Aug 30 09:42:59 [info ] pvpgn::bnetd::conn_destroy: [676] closed bnet connection Aug 30 09:43:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=352114755 Aug 30 09:44:00 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1574233312 Aug 30 09:45:02 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=352204752 Aug 30 09:45:30 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1574323312 Aug 30 09:46:32 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=352294749 Aug 30 09:46:59 [debug] pvpgn::bnetd::handle_udp_packet: [616] got udpping unknown1=1574413312

HarpyWar commented 4 years ago

Did you try master branch to make sure it was not caused by last changes? Or even this one https://github.com/pvpgn/pvpgn-server/releases/tag/1.99.7.2.1 First thoughts there is a "leak" somewhere in the code which increases a queue or like that, and every new iteration of that array takes more time.

Immediately after you stop bots cpu load on pvpgn server becomes normal, Is that correct?

StevenFredette commented 4 years ago

Did you try master branch to make sure it was not caused by last changes? Or even this one https://github.com/pvpgn/pvpgn-server/releases/tag/1.99.7.2.1 First thoughts there is a "leak" somewhere in the code which increases a queue or like that, and every new iteration of that array takes more time.

Immediately after you stop bots cpu load on pvpgn server becomes normal, Is that correct?

It definitely wasn't caused by the latest build. I actualy havent updated since Finalize 1.99.7.2.1 release #362 is the version were I found the original leak. Now I'm on your latest dev build. Still here also.

Yes, once the bot got closed. The server CPU instantly went to 0%.

HarpyWar commented 4 years ago

The robust way (but may be not so fast in your case) is to try different commits to find where the issue was started. Firstly try with disabled Lua to exclude it from suspicion (dont every commit, just once to check it works fine or not).

Brief tutor how to do it: 1) Install Visual C++ as it described in readme of the Magic Builder repo in "Requirements" and download Magic Builder itself. 2) Open pvpgn commits history https://github.com/pvpgn/pvpgn-server/commits/master 3) Select any commit and click on <> to browse it's code. 4) Download the repo ZIP of this point and unpack into source directory which is in Magic Builder root path. 5) Run build_pvpgn.bat and follow instructions. After build copy release\pvpgnconsole.exe and replace on your server to check it. 6) Repeat from step 2 if required. (Hint: after first build you can run rebuild_pvpgn.bat to start build immediately with the parameters from the lastest build)

Don't try every commit, first find which works fine. It can be far away from the current commit, so pass several pages and start from there. When you find it - follow backward to find the commit which contain the issue.

RElesgoe commented 4 years ago

If you only run 1 bot, how long would it take for CPU usage to go up?

cen1 commented 4 years ago

Our SVN branch we use in production is using 100% CPU since we moved from freebsd to debian for unknown reasons. So it might be that there is some kind of resource leak even since 1.99. Luckily it does not seem to affect bnetd working normally in any way. I have not yet deep dived into the problem since we plan to move on from 1.99.

StevenFredette commented 4 years ago

The robust way (but may be not so fast in your case) is to try different commits to find where the issue was started. Firstly try with disabled Lua to exclude it from suspicion (dont every commit, just once to check it works fine or not).

Brief tutor how to do it:

  1. Install Visual C++ as it described in readme of the Magic Builder repo in "Requirements" and download Magic Builder itself.
  2. Open pvpgn commits history https://github.com/pvpgn/pvpgn-server/commits/master
  3. Select any commit and click on <> to browse it's code.
  4. Download the repo ZIP of this point and unpack into source directory which is in Magic Builder root path.
  5. Run build_pvpgn.bat and follow instructions. After build copy release\pvpgnconsole.exe and replace on your server to check it.
  6. Repeat from step 2 if required. (_Hint: after first build you can run rebuild_pvpgn.bat to start build immediately with the parameters from the lastest build_)

Don't try every commit, first find which works fine. It can be far away from the current commit, so pass several pages and start from there. When you find it - follow backward to find the commit which contain the issue.

I would have to prepare for this bounty. I think that would take a great amount of time. It would have to occur next weekend.

If you only run 1 bot, how long would it take for CPU usage to go up?

One bot connected and instantly seen CPU activity. Sits at 0% for 30 seconds then hops to .1% back to 0% hops to .2% back to 0% hops backs to .3%. Just in a matter of minutes.

EDIT: If someone is abusing this bot and bug against you're server. You can defend yourself with the /alert command. It causes the bot to error out.