Open febael opened 6 years ago
@febael Thank you very much for reporting this issue.
Your finding is very interesting. That message received on the network looks faulty, containing basically 0.
How much maximum heap memory do you give the java virtual machine when you start?
@fviale @luispinedamx What do you think about this memory issue?
@febael Sadly I cannot analyze your dump file. I downloaded it twice but I get following error:
java.io.IOException: Unrecognized magic number: 529205248 at com.sun.tools.hat.internal.parser.Reader.readFile(Reader.java:94) at com.sun.tools.hat.Main.main(Main.java:159)
@tobwiens hi,
There's no setting regarding the min/max memory.
Original huge dump file has the expected magic number : xxd java_pid6.hprof | head -1 00000000: 4a41 5641 2050 524f 4649 4c45 2031 2e30 JAVA PROFILE 1.0
Acting on the tar.gz file I get : xxd dump.tar.gz | head -1 00000000: 1f8b 0800 37e4 625b 0003 ec7c 7974 1377 ....7.b[...|yt.w echo $(( 16#1f8b0800 )) 529205248
So I assume you tried to directly open the tar.gz file :)
Hello @febael,
I analysed the heap dump you provided using Eclipse Memory Analyser and it does not show anything special. Like you said, the scheduler seems to be quite empty. No object occupying abnormally the memory (biggest retainer being the JVM class loader, but nothing alarming).
What you describe regarding the PNP message of 1GB is quite surprising though. A few questions below:
Thank you @fviale for your analysis. I used the jvisualvm tool (should be shipped with every jdk installation), and I could get all thread dumps and objects in the heap with that.
Do you have any idea on how to monitor the case for more clues?
Thank you for your detailed explanation, and I indeed observes what you describe in JVisualVM.
Apparently, one frame decoded by PNPServerFrameDecoder (part of the low-level proactive network protocol) seems excessively big. The maximum frame size set in the code is simply MAXINT bytes (which is over 2GB).
A low-level PNP message can be many many things, and based on the content of the frame bytes only, it is nearly impossible to find out the origin of the message.
What I suggest is that you change the log level of the PNP protocol to TRACE: scheduler_server/config/log/server.properties
log4j.logger.proactive.pnp = TRACE log4j.logger.proactive.remoteobject = DEBUG
This will be very verbose, but whenever the problem happens again, we might be able to find the provenance of the last message.
Thank you @fviale, done that and will inform you with findings.
Bonjour Mesdames et Messieurs,
We get OOM errors at server. dump file
"New I/O worker #2" prio=5 tid=16 RUNNABLE at java.lang.OutOfMemoryError.(OutOfMemoryError.java:48)
at org.jboss.netty.buffer.HeapChannelBuffer.(HeapChannelBuffer.java:42)
at org.jboss.netty.buffer.BigEndianHeapChannelBuffer.(BigEndianHeapChannelBuffer.java:34)
at org.jboss.netty.buffer.ChannelBuffers.buffer(ChannelBuffers.java:134)
at org.jboss.netty.buffer.HeapChannelBufferFactory.getBuffer(HeapChannelBufferFactory.java:68)
at org.jboss.netty.buffer.AbstractChannelBufferFactory.getBuffer(AbstractChannelBufferFactory.java:48)
at org.objectweb.proactive.extensions.pnp.PNPServerFrameDecoder.getBuffer(PNPServerFrameDecoder.java:194)
at org.objectweb.proactive.extensions.pnp.PNPServerFrameDecoder.decode(PNPServerFrameDecoder.java:138)
at org.objectweb.proactive.extensions.pnp.PNPServerFrameDecoder.callDecode(PNPServerFrameDecoder.java:104)
Local Variable: java.net.InetSocketAddress#157
at org.objectweb.proactive.extensions.pnp.PNPServerFrameDecoder.handleUpstream(PNPServerFrameDecoder.java:86)
Local Variable: org.objectweb.proactive.extensions.pnp.PNPServerFrameDecoder#3
Local Variable: org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext#20
Local Variable: org.jboss.netty.buffer.BigEndianHeapChannelBuffer#6
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
Local Variable: org.jboss.netty.channel.DefaultChannelPipeline#6
Local Variable: org.jboss.netty.channel.UpstreamMessageEvent#1
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
Local Variable: org.jboss.netty.channel.socket.nio.NioAcceptedSocketChannel#3
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
Local Variable: sun.nio.ch.SelectionKeyImpl#40
Local Variable: java.util.HashMap$KeyIterator#1
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
Local Variable: sun.nio.ch.EPollSelectorImpl#4
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
Local Variable: org.jboss.netty.channel.socket.nio.NioWorker#3
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
Local Variable: java.lang.String#8438
Local Variable: java.lang.String#8439
Local Variable: org.jboss.netty.util.ThreadRenamingRunnable#5
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Local Variable: org.jboss.netty.util.internal.DeadLockProofWorker$1#4
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Local Variable: java.util.concurrent.ThreadPoolExecutor$Worker#5
at java.lang.Thread.run(Thread.java:748)