timja / jenkins-gh-issues-poc-06-18

0 stars 0 forks source link

[JENKINS-34197] OutOfMemoryError when archiving large artifacts #3031

Open timja opened 8 years ago

timja commented 8 years ago

We are building suite installers for 14 languages, each of which is about 1Gb in size. Total size is 19Gb. While archiving these artifacts the note crashes with an OutOfMemoryError.

I've successfully worked around this issue previously by upping -Xmx to 1024m, from 256m)

The build log shows the following:

Archiving artifacts
ERROR: Failed to archive artifacts: suites/*.zip,suites/*.xml,suites/*.md5
java.io.IOException: java.io.IOException: Failed to extract C:\Hudson\workspace\create.production.artifact.suite5/transfer of 42 files
    at hudson.FilePath.readFromTar(FilePath.java:2300)
    at hudson.FilePath.copyRecursiveTo(FilePath.java:2209)
    at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:61)
    at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:236)
    at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
    at hudson.model.Build$BuildExecution.post2(Build.java:185)
    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
    at hudson.model.Run.execute(Run.java:1763)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:410)
Caused by: java.io.IOException: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@c80932[name=NOBUILDSRV03]
    at hudson.remoting.FastPipedInputStream.read(FastPipedInputStream.java:169)
    at hudson.util.HeadBufferingStream.read(HeadBufferingStream.java:61)
    at com.jcraft.jzlib.InflaterInputStream.fill(InflaterInputStream.java:175)
    at com.jcraft.jzlib.InflaterInputStream.read(InflaterInputStream.java:106)
    at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:614)
    at java.io.InputStream.read(Unknown Source)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
    at hudson.util.IOUtils.copy(IOUtils.java:40)
    at hudson.FilePath.readFromTar(FilePath.java:2290)
    ... 13 more
Caused by: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@c80932[name=NOBUILDSRV03]
    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
    ... 6 more

    at hudson.FilePath.copyRecursiveTo(FilePath.java:2216)
    at jenkins.model.StandardArtifactManager.archive(StandardArtifactManager.java:61)
    at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:236)
    at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
    at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
    at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
    at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
    at hudson.model.Build$BuildExecution.post2(Build.java:185)
    at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
    at hudson.model.Run.execute(Run.java:1763)
    at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
    at hudson.model.ResourceController.execute(ResourceController.java:98)
    at hudson.model.Executor.run(Executor.java:410)
Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@c80932[name=NOBUILDSRV03]
    at hudson.remoting.Request$1.get(Request.java:282)
    at hudson.remoting.Request$1.get(Request.java:207)
    at hudson.remoting.FutureAdapter.get(FutureAdapter.java:59)
    at hudson.FilePath.copyRecursiveTo(FilePath.java:2212)
    ... 12 more
Caused by: hudson.remoting.RequestAbortedException: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@c80932[name=NOBUILDSRV03]
    at hudson.remoting.Request.abort(Request.java:303)
    at hudson.remoting.Channel.terminate(Channel.java:847)
    at hudson.remoting.Channel$1.terminate(Channel.java:509)
    at hudson.remoting.AbstractByteArrayCommandTransport$1.terminate(AbstractByteArrayCommandTransport.java:71)
    at org.jenkinsci.remoting.nio.NioChannelHub$NioTransport.abort(NioChannelHub.java:208)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:629)
    at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Connection aborted: org.jenkinsci.remoting.nio.NioChannelHub$MonoNioTransport@c80932[name=NOBUILDSRV03]
    ... 8 more
Caused by: java.io.IOException: An existing connection was forcibly closed by the remote host
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jenkinsci.remoting.nio.FifoBuffer$Pointer.receive(FifoBuffer.java:137)
    at org.jenkinsci.remoting.nio.FifoBuffer.receive(FifoBuffer.java:310)
    at org.jenkinsci.remoting.nio.NioChannelHub.run(NioChannelHub.java:561)
    ... 6 more
Build step 'Archive the artifacts' changed build result to FAILURE
Started calculate disk usage of build
Finished Calculation of disk usage of build in 0 seconds
Finished: FAILURE

Digging into the logs on the slave I find this:

apr 13, 2016 10:06:14 AM hudson.remoting.SynchronousCommandTransport$ReaderThread run
SEVERE: Unexpected error in channel channel
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Unknown Source)
    at java.io.ByteArrayOutputStream.grow(Unknown Source)
    at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source)
    at java.io.ByteArrayOutputStream.write(Unknown Source)
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:87)
    at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:46)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:88)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

Exception in thread "Channel reader thread: channel" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Unknown Source)
    at java.io.ByteArrayOutputStream.grow(Unknown Source)
    at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source)
    at java.io.ByteArrayOutputStream.write(Unknown Source)
    at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:87)
    at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:46)
    at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:88)
    at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:33)
    at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
    at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
apr 13, 2016 10:06:14 AM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated

Originally reported by yngvedh, imported from: OutOfMemoryError when archiving large artifacts
  • status: Open
  • priority: Critical
  • resolution: Unresolved
  • imported: 2022/01/10
timja commented 8 years ago

yngvedh:

I tried increasing -Xmx to 1500m but it still runs out of memory. 2000m does not work because I run 32 bit java. I will try to upgrade to 64 bit java to see if it helps go to 2000m and beyond.

timja commented 8 years ago

danielbeck:

oleg_nenashev Is this a remoting issue?

timja commented 8 years ago

yngvedh:

I can give you a little more information while I run a test on 64 bit java with -Xmx2500m.

We recently added 4 languages which made the total size of the artifacts increase from about 15Gb to about 20Gb. Before this increase, I have never encountered this issue.

I also have been trying to keep Jenkins up-to-date so I update regularly. That means that almost certainly, when we successfully built 15Gb of artifacts, we also ran an older version of jenkins. Also, at the time, the jenkins slave ran with the default heap size, a measily -Xmx250m. This suggests that this OOM issue may be a newly introduced.

Looking at the memory usage of java.exe on the slave, I see that it increases stepwise as it grabs another artifact to upload. At the moment, one third of the way through, it is up to 800mb peak usage.

timja commented 8 years ago

danielbeck:

Are you able to reproduce this behavior on a test instance? If so, could you share the full steps to reproduce?

Would also be interesting to learn whether this is a regression (I'm fairly sure it is), and in which release.

Would be good if you could compare behavior of Jenkins 1.626 and 1.627, as the latter contained a change related to archiving.

timja commented 8 years ago

danielbeck:

I wonder whether https://groups.google.com/d/msg/jenkinsci-users/M702ucKWUFI/vBhQs_npMAAJ could be related.

timja commented 8 years ago

yngvedh:

Upgrading to 64 bit java with -Xmx2500 did the trick for me.

I will see what I can do about trying this on 1.626-7 but I can't promise anything. The biggest issue is the node because it needs an InstallShield license to build this particular job. If I can hook that node up to a clone of my existing server, I should be able to test it. It will have to wait a week or two since we're in the middle of crunch time.

timja commented 8 years ago

yngvedh:

Another update here:
I am only able to run the job once, if I try it a second time, I will get OOM error. I have to restart the service on the slave to be able to run the build again.

timja commented 8 years ago

yngvedh:

We just got a new jenkins server. This one runs Windows Server 2012 R2. I copied the configuration from our old server and I hooked our existing build node up to this one.

I ran 1.626,7 and the latest version 2.6. Note that the reported build node version is 2.40 for all tests.
They all seem to consume about the same amount of memory when archiving artifacts, (about 1.3Gb).

An observation I made was that the nodes used a couple of hundred Mb before the archiving started and while archiving the memory usage slowly grew to the 1.3Gb cited above.