konstructs / client

A voxel based game client.
http://www.konstructs.org
MIT License
48 stars 8 forks source link

Game behaves very weird #55

Closed Henningstone closed 8 years ago

Henningstone commented 9 years ago

I am on windows and tried both compiling myself from here and using the official download release. Both behave very weird when I try to join a server, no matter if it is the official server play.konstructs.org or my own local server. What actually happens:

Of cause I started it in gdb already, but I can't happen to get a stacktrace. When it crashes, it only says recv: No error The server throws out this bunch of text when the client crashes. No idea if it matters, but well...

[INFO] [07/18/2015 22:37:46.625] [main-akka.actor.default-dispatcher-5] [akka://
main/user/plugin-loader/server/$a] Message [akka.io.TcpPipelineHandler$Init$Even
t] from Actor[akka://main/user/plugin-loader/server/$b#-732856434] to Actor[akka
://main/user/plugin-loader/server/$a#-528818464] was not delivered. [1] dead let
ters encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [07/18/2015 22:37:46.625] [main-akka.actor.default-dispatcher-5] [akka://
main/user/plugin-loader/server/$a] Message [akka.io.TcpPipelineHandler$Init$Even
t] from Actor[akka://main/user/plugin-loader/server/$b#-732856434] to Actor[akka
://main/user/plugin-loader/server/$a#-528818464] was not delivered. [2] dead let
ters encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [07/18/2015 22:37:46.625] [main-akka.actor.default-dispatcher-5] [akka://
main/user/plugin-loader/server/$a] Message [akka.io.TcpPipelineHandler$Init$Even
t] from Actor[akka://main/user/plugin-loader/server/$b#-732856434] to Actor[akka
://main/user/plugin-loader/server/$a#-528818464] was not delivered. [3] dead let
ters encountered. This logging can be turned off or adjusted with configuration
settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [07/18/2015 22:37:46.679] [main-akka.actor.default-dispatcher-7] [akka://
main/user/plugin-loader/server/$a] Message [akka.io.Tcp$Aborted$] from Actor[akk
a://main/user/plugin-loader/server/$b#-732856434] to Actor[akka://main/user/plug
in-loader/server/$a#-528818464] was not delivered. [4] dead letters encountered.
 This logging can be turned off or adjusted with configuration settings 'akka.lo
g-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[INFO] [07/18/2015 22:37:47.742] [main-akka.actor.default-dispatcher-2] [akka://
main/user/plugin-loader/universe/player-0] Message [konstructs.PlayerActor$Store
Data$] from Actor[akka://main/deadLetters] to Actor[akka://main/user/plugin-load
er/universe/player-0#-194649793] was not delivered. [5] dead letters encountered
. This logging can be turned off or adjusted with configuration settings 'akka.l
og-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
nsg commented 9 years ago

Not really sure, are you running the latest client built on top of https://github.com/konstructs/client/commit/00bffa16e8a5a5cc603dc53b88b15dcf20053660? We changed the chunk format a little and updated both the server and client yesterday.

It's also possible that we have a Windows bug here, nether me or @petterarvidsson have access to a Windows install so it is possible that we have accidentally broke something.

We just also pushed a big change to remove a lot of dead code. Can you test the latest release from HEAD or https://bintray.com/konstructs/windows/client/view#files ?

Henningstone commented 9 years ago

I just tried that out that minute. Running the client of your link didn't change anything, but merging the latest commits was a catch! Compiled that one, and now I can join... no wait, it crashed again. But I had been on the server for rather a minute :o The crash occurred when I pressed e and messed around with my mouse wheel. No idea weather that matters. And huh, now the game attends to crash every time when joining? (Yeah, I actually can join and see the world generating for a second). Er, one thing to add: When I login to my previously created account useus, paseus I can see the world loading, but when I create a new account with any random name and pass, it crashes instantly again, without seeing any block of stone :)

PS: Is there a possibility to properly debug it or am I just to dumb/windows to find it? I mean with getting a callstack and the code position on crash, pausing it and messing around with internal variables etc...

PPS: my compiler throws out that a lot of the following warnings, looks like another bunch of dead code? Or my mistake again?

c:\mingw\include\inttypes.h:268:11: warning: inline function 'llabs' declared bu
t never defined [enabled by default]
 long long llabs (long long);
           ^
nsg commented 9 years ago

The development binary is built with debug symbols (-g) so they should be possible to use gdb to make sense of the problem with something like gdb konstructs.exe -> run -> crash -> bt.

I do not have a native Windows install to test in, but I can test in a virtual machine. What Windows version are you using? 32 or 64 bit?

It do indeed sounds like a bug, possible something we have missed when we changed things with the chunk loading. If you can't get a gdb session up and running can you try to disable a few functions to try to track down where the problem is?

No idea about the llabs function, part of mingw, and it's one a warning so nothing to worry about. Like you stated, dead code probably.

Henningstone commented 9 years ago

I am using Windows 7 SP1 64 bit. Just tried to cmake the whole thing for compilation with Visual Studio v10.0 2010, and it gives me some project files for VS10 indeed, but when I try to build the project it gives me, there are a low creepy errors, no idea how to setup it properly to make it work... Why can't it just work by itself..?!

nsg commented 9 years ago

I do not think that the project builds with Microsoft’s compiler, you need to build it with MinGW. Did gdb tell you anything?

nsg commented 9 years ago

Tested on both Win7 SP1 32-bit and in Wine 1.6 under Linux with no problems. So, I guess it is not a general problem with Windows, but something triggered by your system. We can probably track it down if you are able to build it on your system, let me know it you get stuck.

Henningstone commented 9 years ago

I thought that it wouldn't work with windows' compiler, but I gave it a try anyway :) It just came to my mind to compile it using my Linux VM (Ubuntu 15.04 running on it). It compiled fine, without any strange warnings or something else that shouldn't have been there. Then, I started it, logged in to the server and my first thought was "HOLY.... That is awesome!!". I've never seen the world before, because it never generated properly! On windows, if I see something, I see a big block of stone. So I ever thought that the world looks like this, maybe for testing purposes. But no, the world just wasn't loaded for me.

So ok.... Looks like I now need to determine the problem. It is obviously windows-and-world-loading-related, but you said it runs fine for you and even in Wine? Anyway, I think it is (not only) my fault because I do something wrong with compiing it myself, but the binary I download from your mirror fails too. Hm.

nsg commented 9 years ago

Have you had the time to try to run the game in gdb? Found any solution to this, or have any ideas? Unfortunately I have never been able to trigger this bug my self so it is hard to debug. We have also released a new version, maybe it is fixed? (try the one called "release 1") :)

Plan B, if the crash is still there, I can probably craft a special "debug version" for you so we can track this down.

Henningstone commented 9 years ago

Release 1 doesn't work either. Since I am experiencing this issue on windows only, but not on Linux, I think it is a windows-only thing. I also tried to run the game on another Windows 7 computer aswell as on a Windows XP computer; it's always the same, the game crashes after connecting. So I guess everybody on windows will not be able to play this game. The use of gdb fails too, it doesn't tell me anything. Maybe I will succeed in adding a debug system to the client which automatically dumps a calltrace into a file on crash. Dunno if it'll work, though and it could take me some time...

nsg commented 9 years ago

Several different systems sounds no good, I have successfully played the game on windows in a virtual machine, need to try to find a windows install to try it natively. I will keep you updated.

Henningstone commented 9 years ago

Ok forget what I wrote (maybe not completely, but well...) it FINALLY WORKS!! I have no clue what I did that it works now, I only deleted my copy of the repository and fetched the master branch again. Compiled that, and I can join the server and play on it, yay! Unfortunately, it has a surprisingly bad performance: Significant fps drops when I place/remove a block, and blocks that I removed do not disappear completely. Nevermind, it works I'm so happy :D

So what did not work was the "release 1"-executable from your website. I just used that because I was to lazy to make my own from the code. That executable failed on my three test machines (no vm's!)

Henningstone commented 9 years ago

Hm, maybe I was happy a bit too early :( I didn't change anything, just compiled it again and hey, it crashs again. Damn windows.

// EDIT: The game possesses >1 GB of ram and 25-30% of my X4 3,1 GHz processor, is that legit?!

nsg commented 9 years ago

This is great news, I'm happy it finally works! I guess one of these changes fixed something :) I also realised that currently (since a few days) the server on play.konstructs.org requires "protocol version 5", release 1 talks version 4 ... and we do not catch that error properly at the moment.

We probably need to setup a dev-play.konstructs.org or similar for latest developement and keep play.konstructs.org for latest stable release ... I guess release 2 will be a good time for that.

I'm a little worried about the fps drops, is it verified fps drops, or just "feels laggy"? Press F3 to enable the debug screen, fps is listed there, or are you talking about the time it take for the block to show up on screen?

nsg commented 9 years ago

Nah, well I guess there still are a few Windows bugs to fix then :)

nsg commented 9 years ago

I'm running the development version from the website now, against the server. Everything works, no crashes.

Windows 7, AMD Athlon II X2 250 @ 3Ghz, 4G RAM, Radion HD 4200 (so: a old computer. Really laggy but it works)

I see what you are talking about when you said "and blocks that I removed do not disappear completely" ... it is actually fixed, not sure why it is still there under windows. Need to investigate that.

Yes, the game will probably use plenty of both RAM and CPU, especially when loading chunks. A little over one gigabyte RAM sounds about right.

Must be some specific combination of hardware/drivers/os that cases this problem.

Henningstone commented 9 years ago

That is very strange that you can run it without problems, since I have nearly the same specs (nearly... but a bit better): Windows 7, AMD Athlon II X4 640 @ 3Ghz, 8GB RAM, Radeon HD 6800

Maybe it would help to reinstall windows (what I didn't do for over a one and a half year)? Windows is quite shit, you need to reinstall it one or two times every year or it will completely annoy you. But I don't want to reinstall it although I'm supposed to :( Damn windows.

Ok let's talk about the important things again. The crash "depends" on with what username I log in to the server. With some usernames the crash is immediately, with others, I can see the world for a few seconds. So strange...

Henningstone commented 9 years ago

The only thing what would really (i mean, really really) help me to find the bug and fix it, were if the game could be compiled using Visual Studio. cmake -G "Visual Studio 14 2015" gives me everything I need, but the code itself is not compatible :( The VS debugger shows me the exact line in the code where the game crashes, gives me a complete callstack aswell as a possible reason for the crash. That would be indeed very helpful to fix it. Maybe I'll refractor the code so it compatible, but that should take some time...

Henningstone commented 9 years ago

Uhhh just walked through your code a bit, and now I know why it is so difficult to track the crash: It is no crash. Or at least, for the computer, it is no crash. Before a crash can appear, you exit the program. Of cause, no debugger can catch a crash here, since exiting cannot be called "crash" :/

// EDIT: (I don't want to spam post here :0) I had some kind of "success" in tracking down the error. It has something to do with the recv. I now investigate this... :()

Henningstone commented 9 years ago

Ok, I make another post. What happens when I login with a wrong password? I join the server and after a few seconds it "crashes" because of this:

(line 231)

if ((length = recv(sd, ((char *)&size) + t, HEADER_SIZE - t, 0)) <= 0) {
                if (running) {
                    perror("recv");
                    exit(1);
                }

running != 0 here, so it exits.

Is there any possibility how I can reach you except github? Irc, skype etc, that would be much easier and faster comunication.

petterarvidsson commented 9 years ago

@Henningstone The server is a bit simple minded and will close the connection if the password is invalid. This will make this line return 0 or possible a negative error number.

There is a non-error case that can happen with recv and it is if the program is interrupted by a signal. I'll see if I can reproduce it and properly handle it. I think that there might be so that the cnd_signal is implemented using process signals on some platforms. This could lead to the code failing in this way. I'll keep you updated.

petterarvidsson commented 9 years ago

@Henningstone Please see #76

Henningstone commented 9 years ago

Ok, I tweaked your #76 a bit so that it prints me the actual error code. And here we are, the problem seems to be server-sided :0

Look at this: (The lines who say "press a key" symbolize a game crash here)

recv: 10054: No error
 Press a key to restart the game...
package to large, received 50331648 bytes
 Press a key to restart the game...
package to large, received 117440512 bytes
 Press a key to restart the game...
recv: 10054: No error
 Press a key to restart the game...
recv: 10054: No error
 Press a key to restart the game...
package to large, received 1124073472 bytes
 Press a key to restart the game...
recv: 10054: No error
 Stopping!

This is the log from a little batch script I wrote, which restarts the game on crash automatically and keeps the error codes of every session. No matter if I go on the 'development' server.jar downloaded from your website, or on the play.konstructs.org server. Every time the same error.

Microsoft's description of the code 10054 is as follows:

WSAECONNRESET (10054)

Connection reset by peer. An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

petterarvidsson commented 9 years ago

I can indeed see a couple of buffer overruns on the server side.

[WARN] [08/21/2015 18:41:12.147] [main-akka.actor.default-dispatcher-7] [akka://main/user/plugin-loader/server] aborting connection (buffer overrun)

Are you still just seeing the skybox or does the client start downloading / displaying the block world?

petterarvidsson commented 9 years ago

Does it also have these overruns when you run it against a local server?

Henningstone commented 9 years ago

It actually depends on with which username I login, what sounds quite strange.

For example, when I login with name kkkkk, password lllll (a combination which doesn't crash), I see only the skybox and the debug says that I don't receive chunks. (But I should be requesting chunks, right?)

Talking about your server's overrun warning... I just wanted to look if I have that on my local server too, but you won't believe it: I can't get my game to crash again! Now when I want it to crash, it doesn't crash. Nice. (That's malicious purpose, sure it is!)

When I login with the same credentials onto my local server, I spawn in the world and can play (yay!). It loads correctly, and moving forward loads all new chunks as expected. (Nevertheless, It has a worst performance than Minecraft with 256x texturepack and SEUS-ultra shader, wtf, but this is another story...) When I say "worst performance", I mean 5-15 fps. Nvm.

petterarvidsson commented 9 years ago

Interesting, we might have solved one bug with the fixes I made then. I would say there is still the bug of the server's buffer overflowing (I will try increasing it) and the problems with the client closing before it can show helpful messages such as "Incorrect username/password". On top of this, your performance seems to be completely off. I have a smooth 60 FPS experience on my laptop with a Intel HD Graphics 4400, which I would say can be considered pretty "low end", so my guess is that something is still not working properly on windows.

CPU wise, the client can be very CPU insensitive, it tries to compute the chunks as fast as possible on 4 different threads, sometimes consuming all the available CPU time on several cores. On the other hand, at least on Linux, this does not at all affect the rendering thread which can still provide a very nice FPS. If you stop moving the player around and let it be for some minutes, what is then your FPS? And what is the process CPU usage? When I leave it idle it usually levels out on 25% CPU usage on one core.

Henningstone commented 9 years ago

Regarding the FPS, well, when I stand still and let the world load completely, I have up to 100 FPS then. But when I move a little bit, the game loads new chunks so fast (because of the high fps), that the FPS itself drop to less then 10 for some seconds.

And these 25% CPU utilization is the magic number of CPU utilization an infinite loop has (on quad core, 100/4?). Here, it's the main loop which has nearly nothing to do if we are idle. So the while(1) loop "overwinds" like if you have a car and step on the gas while being in the neutral gear. Maybe a shit comparison, though. Just try to write Sleep(1) at the end of the mainloop, and you'll see: When the world is loaded, we have 0-1% CPU, but the fps is stable anyways. And when not connected, we have 0% either, instead of 25% which seems completely overdue. Or lets flip it, write a new program which look as so:

int main(){
    while(true){ }
}

It has 25% too, you'll see ;)

petterarvidsson commented 9 years ago

@Henningstone Interesting! glfwSwapInterval sets the FPS of the main loop to that of your screen and glfwSwapBuffers will wait any amount of time left over before the buffers are swapped, pegging the FPS to refresh rate of the screen. This means that on my computer, when nothing is loaded it flats out to the time my CPU require to render the scene 60 times per second (which is the refresh rate of my screen). On my computer this is ~25% of a single core (it is not maxing out a core).

The chunk loading is limited by an artificial FPS of 60. This means that the client will never load more than (fps / 60) * MAX_PENDING_CHUNKS at a given time. Since your refresh rate seems to be 100 your client is loading chunks substantially more aggressively. This also explains why you were hitting the buffer limits of the server (it has been tweaked to the the MAX_PENDING_CHUNKS value). Maybe you can try to set MAX_PENDING_CHUNKS to 25 and see if you get some improvement? That should load chunks less aggressively. Then we will need to figure out how to get the refresh rate so we can use it instead of the constant 60.

petterarvidsson commented 9 years ago

@Henningstone I updated #76 with a modified version of the algorithm that limits the number of chunks loaded. It should now be more aggressive in limiting the number of chunks loaded when the FPS is falling. On my computer I am getting a smoother 60 FPS experience now, it would be interesting to know if it helps you as well!

Henningstone commented 9 years ago

@Henningstone Interesting! glfwSwapInterval sets the FPS of the main loop to that of your screen and glfwSwapBuffers will wait any amount of time left over before the buffers are swapped, pegging the FPS to refresh rate of the screen.

So vsync? Setting the fps to the refresh rate of the screen? But my screen has a refresh rate of 60 too, and the fps are anything but 60 for me :0

I changed (fps / 60) * MAX_PENDING_CHUNKS to this: (fps / 100) * (MAX_PENDING_CHUNKS/2) and it helped a lot! I can join the server and the world loads! Not that fast as before, but I have rather stable fps while loading the chunks since I've set the aimed fps rate to 100 and halved the maximum.

Now I pulled your changes and tested that. It didn't really help... I do not receive chunks from the server, and I have >1000 FPS, because it only need to render the sky.

petterarvidsson commented 9 years ago

@Henningstone Oh, it seems that the glfwSwapBuffers does not do any waiting for you at all then. I can then clearly see how the FPS would then first go super high, leading to a huge number of chunks being loaded immediately to then drop completely. I guess what we really need to fix is so that glfwSwapBuffers doesn't return before the screen has refreshed, or it is just going to do a lot of unnecessary updates that you won't even see.

Looking around a bit there seem to be some issues with glfwSwapBuffers and DWM. Are you using it? If so, can you try disabling it?

Henningstone commented 9 years ago

Oh damn, Windows Aero is the problem. You are right, the dwm causes problems, and windows aero needs it. So to fix that, I implemented some code which disables Windows Aero and therefore dwm temporarily while the game is running. (It's legit Windows-Compatibility-stuff) I have nice 60 fps now and it doesn't fluctuate anymore. Not even while loading the chunks, because now your last commit to #76 works too, yay!

I think we can assume this as fixed when you've merged my commit.

Henningstone commented 9 years ago

No, wait, it is not closed. It isn't that easy as I thought, because the required header is only present if we have the Windows SDK installed. We would need to bundle the compiled library and then call the functions from there... see this: http://nicug.blogspot.de/2011/03/windows-7-taskbar-extensions-in-qt_24.html

petterarvidsson commented 9 years ago

https://github.com/glfw/glfw/commit/8309e0ecb09fe5cf670dff5df1e7ca71821c27bd supposedly makes it possible to use glfw with DWM. It's not released yet, but I'll see if we can make upgrade to it and try it out.

Henningstone commented 9 years ago

Would be really nice if that'd work, because it is hard to access the dwm api when not compiling with Microsofts compiler.

petterarvidsson commented 9 years ago

Got something for you to try in #76 again.

Henningstone commented 9 years ago

That works, but I would recommend everybody on windows 7 to disable aero to play this game, because the fps may suffer significantly. Rightclick on konstructs.exe -> Properties -> Tab "Compatibility" -> Check "Disable desktop composition"

Henningstone commented 9 years ago

I just installed Linux MINT and built the game there, and I am so impressed how easy and fast it was, and about the incredibly good performance! There are so many issues on Windows, and it is quite difficult to compile the source. I don't know why I am telling you this, I just wanted it to be said. Another thing is that the perror doesn't work on windows. For example, on windows I see this recv: No error while I see this recv: Connection reset by peer on linux.

petterarvidsson commented 9 years ago

@Henningstone @nsg Let's reach a preliminary conclusion here. I think we should continue working on improving the Windows experience, but for now I think we should recommend everyone on windows to disable aero and then we should upgrade to the new glfw version when it is released.

I have updated the download page with the instructions provided. I will now merge #76 but remove the glfw quick hack update. I'll leave this issue open since it can not be said to have been fully addressed yet, but only improved a bit.

Henningstone commented 9 years ago

At lease we know what's the problem and how to circumvent it. Leaving it here wouldn't matter to me anymore, because I'm on Linux now, too :) (Multiboot, you know...)