jonkogan / nullpomino

Automatically exported from code.google.com/p/nullpomino
0 stars 0 forks source link

[NetServer] LinkedList: ConcurrentModificationException #55

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
From time to time, I get these FATAL log messages in the server log:

[main] FATAL NetServer: Non-IOException throwed on server mainloop 
java.util.ConcurrentModificationException at 
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at 
java.util.LinkedList$ListItr.remove(LinkedList.java:729) at 
mu.nu.nullpo.game.net.NetServer.run(NetServer.java:1045) at 
mu.nu.nullpo.game.net.NetServer.main(NetServer.java:889)

They always happen exactly after a logout:
[main] INFO  NetServer: Logout: *ip* (*ip*)

Haven't observed any known related issues, but could be some potential memory 
leak or worse.

Original issue reported on code.google.com by bob.ins...@gmail.com on 16 Jan 2011 at 1:23

GoogleCodeExporter commented 8 years ago

Original comment by bob.ins...@gmail.com on 16 Jan 2011 at 1:24

GoogleCodeExporter commented 8 years ago
sry, it screwed up the formatting a bit (newlines):

[main] FATAL NetServer: Non-IOException throwed on server mainloop
java.util.ConcurrentModificationException at
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761) at
java.util.LinkedList$ListItr.remove(LinkedList.java:729) at
mu.nu.nullpo.game.net.NetServer.run(NetServer.java:1045) at
mu.nu.nullpo.game.net.NetServer.main(NetServer.java:889)

Original comment by bob.ins...@gmail.com on 16 Jan 2011 at 1:27

GoogleCodeExporter commented 8 years ago

Original comment by Zirc...@gmail.com on 16 Jan 2011 at 4:03

GoogleCodeExporter commented 8 years ago
Added some try-catch statement in r675.
Judging from the line-number, I think it happens when delayed-disconnection 
(Used when someone tries to connect with wrong version or the IP is banned) is 
used, but I'm not really sure because it doesn't happen on my local PC for some 
reason.

Original comment by pbomqlu910963@gmail.com on 17 Jan 2011 at 8:54

GoogleCodeExporter commented 8 years ago
Updated dev server to r675.
What is the behavior now, when the exception is thrown and we want to handle 
the problem in the catch block? Does the entry still get removed? And do we get 
a log message, so we can check?

fyi: this FATAL error happend 37 times so far, in the last two weeks

Original comment by bob.ins...@gmail.com on 17 Jan 2011 at 10:23

GoogleCodeExporter commented 8 years ago
>What is the behavior now, when the exception is thrown and we want to handle 
the problem in the catch block? Does the entry still get removed? And do we get 
a log message, so we can check?
The try-catch block itself is just for debug log. But I also exchanged the 
place of "changes.remove();" and "logout(key)"  because the code of "logout" 
has a lot of LinkedList manipulations. I still don't know the exact source of 
the bug though.

Original comment by pbomqlu910963@gmail.com on 17 Jan 2011 at 11:10

GoogleCodeExporter commented 8 years ago
I just analyzed the log for 675 (running for ~30h):

On more occurence of ConcurrentModificationException:
02:32:58,621 [main] INFO  NetServer: Logout: *ip* (*ip*)
02:32:58,676 [main] FATAL NetServer: Non-IOException throwed on server mainloop
java.util.ConcurrentModificationException
        at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:761)
        at java.util.LinkedList$ListItr.next(LinkedList.java:696)
        at mu.nu.nullpo.game.net.NetServer.run(NetServer.java:1048)
        at mu.nu.nullpo.game.net.NetServer.main(NetServer.java:902)

and a new Exception (same source line):
02:21:39,959 [main] INFO  NetServer: Socket disconnected by IOException
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at mu.nu.nullpo.game.net.NetServer.doRead(NetServer.java:1180)
        at mu.nu.nullpo.game.net.NetServer.run(NetServer.java:1096)
        at mu.nu.nullpo.game.net.NetServer.main(NetServer.java:902)
02:21:40,100 [main] INFO  NetServer: Logout: *ip* (*ip*)

Original comment by bob.ins...@gmail.com on 18 Jan 2011 at 3:52

GoogleCodeExporter commented 8 years ago
I added proper synchronization on pendingChanges and pendingData members in 
r681. Should resolve this issue, will report back, if I see anymore problems in 
the server log with this.

The second exception I posted in comment 7, I think is not related from what I 
could see, and might even be wanted at the moment? (it is caught and just 
thrown again). For a truely failed read a seperate exception is thrown though, 
not qutie sure if this still needs a fix atm, but as it looks not related to 
#55.

Original comment by bob.ins...@gmail.com on 18 Jan 2011 at 5:26

GoogleCodeExporter commented 8 years ago

Original comment by bob.ins...@gmail.com on 18 Jan 2011 at 5:37

GoogleCodeExporter commented 8 years ago
added another overlooked synchronization in r690

Original comment by bob.ins...@gmail.com on 19 Jan 2011 at 9:46

GoogleCodeExporter commented 8 years ago
Testing showed that this is not fixed, needs some deeper investigation.
Since the server does not show any problems due to these errors, it's not a 
show-stopper for 7.5.
We'll either patch it later for 7.5 compatible server, or bring the fix with 
7.6.

Original comment by bob.ins...@gmail.com on 20 Jan 2011 at 4:14

GoogleCodeExporter commented 8 years ago
We can patch it in 7.5.1 or something like that.

Original comment by Zirc...@gmail.com on 21 Jan 2011 at 4:06