toddw123 / RotMG_Clientless

Compatible with version X16.0.0
MIT License
11 stars 4 forks source link

"Failure(0): Lost connection to server" #26

Closed toddw123 closed 7 years ago

toddw123 commented 7 years ago

So one of the nice things about running 60+ lootbots has been basically stress testing various things. I have noticed that sometimes random bots will get the "Failure(0): Lost connection to server" message. Then when they reconnect it keeps kicking them. Other times ill see one get dc'd and then when it reconnects its fine. Really really strange. Im wondering if its due to the clients still not sending playerhit/enemyhit/otherhit/squarehit packets? In the nexus there wouldnt be any playerhit or enemyhit packets right? Another player shooting you doesnt trigger the playerhit, right? But what about otherhit? Or squarehit? Does the squarehit packet go out anytime a projectile passes over the square? I need to load up k-relay and do some testing i guess, but any insight would be nice. Im pretty sure this is the reason why random bots are getting dc'd, and other times it might be due to them running way way way too fast, but since i dont have DEBUG_OUTPUT defined im limited to what i have being printed on the screen for me.

I also really need to work on proper "Failure" handling. I did add a part in there that parses the Failure and puts the thread to sleep if it gets the "Account in use (X seconds until timeout)" message. But then another message like "Account has 1 active character in use" or something, which i dont know if i should put the thread to sleep for 10 minutes or what. I really want to get it set up to handle these and reconnects more smoothly. It seems to almost bruteforce the reconnect at the moment, i have it try to reconnect up to 6 times and then wait 2 minutes if they keep failing. And usually if it fails more then twice it will continue to fail over and over and over.

The other error i see sometimes that im not too sure on what it truely means is the client not getting the 5 bytes for the packet header. I have it attempt to do a reconnect when this occurs because otherwise it would throw off the packet data if i read it out of order. I modified how i read the initial 5 bytes in an attempt to prevent this error in the future but i have seen it happen still. Although i think its really just related to the server dc'ing the character for whatever reason, because it will generally be 1 or 2 bytes when i get this error. if 1 byte, its 0xff. if 2 bytes, its usually 0x00 0x00.

Hopefully any errors/bugs in the program i can find and solve during this testing/lootbot phase. I keep forgetting to check the total memory usage after long periods, the cpu for 60+ clients still stays <2% for me (generally around ~1%), but the mem is what im interested in. If theres a memory leak then it should just keep getting bigger and bigger and bigger. But just because the number gets big doesnt always mean theres a leak/problem, if it gets large but then eventually goes back down then that could just mean a lot of data was received and was being stored for a certain amount of time before it was destroyed. Ill try to remember to keep my eye on this.

Ps. im still surprised at the shit people drop. Ive gotten a few different tops, a fucking etherite dagger lol, and it seems people really dont like st rings as ive gotten a decent amount of those. haha so fun.

Zeroeh commented 7 years ago

A 0xff byte means the server is refusing the connection (usually due to a hello or reconnect packet error, maybe because the "keys" variable needs to be filled after the first connection). Not too sure what the two null bytes (0x00) mean however.

Squarehit, otherhit, ect don't need to be added as those are if the player (client) is hitting an enemy. As for receiving the failure packets when reconnecting, I think it has to do with your character staying in the map for an extended period but I see you took that into account with the acct in use check. Not too sure, maybe there's something wrong when it's parsing update packet?

toddw123 commented 7 years ago

I get the 0xff or 0x00 0x00 bytes after they are already connected and in game, thats the confusing part. It has nothing to do with the reconnect or hello at that point. Which is what i cant understand, especially because it seems to be so random.

edit: after like 12+ hours i went to look at a running instance of the lootbots i had and it had crashed for some reason. Im not sure why as i wasnt debugging it or anything. Im doubtful it was a memory leak problem because as i watched it earlier i was watching as the mem usage would go up and down as it should. Oh well. I did manage to get a Ring of Decades at some point so fuck it who cares haha

edit2: seems there might be a memory leak when i attempt to reconnect after getting dc'ed. Ill have to look into it. Also wanted to post, got my first pixie lol @ idiots dropping shit like that.

toddw123 commented 7 years ago

I made some additional changes to the lootbot setup. Changed the clients to an unordered_map because the vector .erase() was giving me shit.

I also solved (or believe i solved) the memory leak. Originally i was freeing the allocated char buffer in the recvThread at the end of the while loop, and i had originally added the free in places where i called break as well since i knew it would skip the free at the end. But in my half-assed attempt to add the auto-reconnect after being dc'ed i forgot about freeing the 2 char buffers used for packets. I was noticing the memory sit around 50MB+ after like 10 minutes, and i watch it climb as high as 300MB after hours of running. Which that really tipped me off that something was wrong. Finally realized my dumbassedness and moved the free() calls to the appropriate places (right after i no longer need the char buffers) and now ive been monitoring an instance for the past few hours and its remained around 20-40MB. The fact that it isnt sitting around 50+ anymore is a good sign, and ive watched it do reconnects quite a few times, which is where it use to leak memory before. There could very well still be memory leaks in this program, but its looking a lot better then it was a day ago in my opinion. Im going to keep monitoring it while im at work for the next 3-4 hours and by the time im off if it hasnt surpassed 50MB ill be pretty convinced ive fixed the memory leaks.

(this is all with about 63 clients running still, and cpu remains <2%, ~1% on average for me)

toddw123 commented 7 years ago

damn still a memory leak somewhere, but i have no idea where or what causes it yet. After watching the program run for about 8+ hours the memory stayed between 20-40MB. But then i left it running while i went to sleep and when i woke up it had somehow jumped to 800MB+ and crashed. No idea why for the first 8 hours it was fine, but the second it wasnt.

ebf34e12952930cf commented 7 years ago

did some debugging, and it looks like the memory leak is in PacketIOHelper::SendPacket. memory was slowly going up there, and now its staying stable after deleting both byte pointers

edit: ill add some screenshots later, of before and after

before:

after:

toddw123 commented 7 years ago

lol fucking hell. thanks for noticing, i completely forgot about those pointers in there. Im suprised still at how random it was acting for me, as i said the memory stayed pretty consistent while i was on my computer working, but when i got off for 8+ hours and came back the memory had exploded. Hopefully freeing those pointers will be the last of the memory leaks...for now.

ebf34e12952930cf commented 7 years ago

i just added 2 before the return, to fix this leak

delete[] encrypted;
delete[] pack;

should I make a PR or will do add them?

toddw123 commented 7 years ago

ill get them added here in a few minutes.

For some reason, not sure if it has something to do with the update the just got rolled out by deca, but now im not logging in. It connects and then i dont get a response from sending the hello packet. Could just be something funky going on with my computer so ill keep checking it out as im pretty sure nonthing i changed recently would cause this.

ebf34e12952930cf commented 7 years ago

everything is working fine for me, with straight copy from master and only these changes

toddw123 commented 7 years ago

ugh. stupid servers. my test clients are set up with uswest right now, and uswest is down. So im sitting here trying to figure out why the fuck its not letting me in the game on my bots, and that is why lol.

toddw123 commented 7 years ago

YOU ARE MY HERO HOLY SHIT!

The mem usage for 60+ lootbots is now back down to ~5MB instead of 40MB. My guess is that initial spike of incoming update packets which require an updateack response for each, across 60 clients, is what causes the program to sit around 40MB+ before. Because 40MB to 5MB is huge and im not sure what else would of caused that dramatic of a change.

This is why i love working with others. I hadnt looked at the packetiohelper class in forever, it wasnt even on my radar as far as what might be leaking memory. Amazing.

ebf34e12952930cf commented 7 years ago

i just looked at debugger snapshots and saw that there were a lot of char*[] (iirc) allocations and i looked where the oldest allocations of those types came from.

fyi, not sure if this is a big deal, but I see these warnings a lot

toddw123 commented 7 years ago

The compiler gives you a warning for free? or when you are debugging you get a warning about it? It looks like, from your picture there, that its showing the definition of free(). Which does exactly as it says, its used to free something allocated with new. Which is what its being used for, ive always used it over delete when it comes to freeing an allocated char/unsigned char array.

ebf34e12952930cf commented 7 years ago

nah there's nothing there when i hover over it on a non-array type. also,

edit: im sure it still works fine, the warning is marked as "Non-consistant resource acquisition-reclaim pair"

toddw123 commented 7 years ago

Hmm the compiler gives you the warnings? But either way, i like to avoid warnings so if you are getting warnings from free ill switch to delete. It doesnt effect it either way for me so whatever works best for everyone.

toddw123 commented 7 years ago

So fixing the memory leaks seems to have solved a lot of issues, or atleast ive been noticing a lot less repeated reconnect attempts. I feel like after time some client's packetio->sendpacket function would get corrupted in some way or something because i would watch the same clients reconnect over and over and over and over. They would connect, be logged in for a few seconds, then get a failure packet from the server. I couldnt figure out why. Now that the sendpacket function isnt leaking memory anymore this problem seems to have stopped. I still occasionally get dc'd on a client, but they usually reconnect just fine.

So with the fix i felt okay ramping my lootbots up from ~60 clients to now ~90 clients (just shy of 90 actually). 4 bots per server. After 10 minutes, cpu is hanging out ~1% as usual, memory is around 7MB right now and i havent even seen it pass 10MB yet. And the network column in taskmanager is claiming 2Mbps+ lol. Thats a lot of data going back and forth with ~90 clients.

edit: well, still an error somewhere it seems. Although i guess my code in lootbot is different in a lot of ways and i havent brought over all the updates just the ones i though were critical. But it still crashed on me after running for awhile, just out of nowhere. And im running in release outside of visual studios so i have no idea what triggered the crash. The memory at crash was only ~7MB so i highly doubt it was a mem leak that caused it to crash. So weird.

Zeroeh commented 7 years ago

free is a C function while delete is c++. I remember reading that free wont remove memory allocated with "new" since new is a function added in c++. I like to remember Malloc and free(c) : new and delete(c++)

Also: Will you be adding the bug fixes from lootbot into master branch, such as reconnect fix?

toddw123 commented 7 years ago

there was no problem with reconnect, remember? It was that the currentTarget wasnt reset after you reconnected to the bazaar.

but ive moved pretty much everything from lootbot over to master that im going to move.

and yes free is suppose to be for only malloc/alloc etc type stuff. It doesnt say it wont work with new[], it just says that it can have unknown results. It was working for me, always has been. But since BlackRayquaza was getting warning from the free's i switched them to deletes.