Closed tfoote closed 8 years ago
Again running 4 ros_comm PR jobs:
Mar 17, 2016 5:38:07 PM jenkins.slaves.JnlpSlaveAgentProtocol$Handler$1 onClosed
WARNING: Computer.threadPoolForRemoting [#14262] for ip-172-31-0-248.us-west-1.compute.internal terminated
java.io.EOFException
at org.jenkinsci.remoting.nio.NioChannelHub$3.run(NioChannelHub.java:613)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:112)
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Mar 17, 2016 5:38:56 PM hudson.model.Run execute
Digging into the slave we're running out of memory: https://gist.github.com/tfoote/a0472cf30a4420d67d11
There's an older OOM kill ~ 3036 seconds likely the previous instance
@dirk-thomas can you link the jobs that failed so we can see what they were running at the time of failure?
Looking at these, they're all failing in the same test
16:47:42 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
16:47:42 -- run_tests.py: execute commands
16:47:42 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
16:47:43 ... logging to /home/buildfarm/.ros/log/rostest-8eee27870d28-23320.log
16:47:43 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
16:47:35 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
16:47:35 -- run_tests.py: execute commands
16:47:35 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
16:47:36 ... logging to /home/buildfarm/.ros/log/rostest-6ff1f4c13a22-23330.log
16:47:37 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
3
16:47:29 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
16:47:30 -- run_tests.py: execute commands
16:47:30 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
16:47:30 ... logging to /home/buildfarm/.ros/log/rostest-a1620edfd07a-23335.log
16:47:30 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
4
16:47:28 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
16:47:28 -- run_tests.py: execute commands
16:47:28 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
16:47:28 ... logging to /home/buildfarm/.ros/log/rostest-a6958de2b94e-23328.log
16:47:29 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
1
17:37:05 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
17:37:05 -- run_tests.py: execute commands
17:37:05 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
17:37:06 ... logging to /home/buildfarm/.ros/log/rostest-88bd77a18b23-23329.log
17:37:06 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
2
17:37:17 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
17:37:17 -- run_tests.py: execute commands
17:37:17 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
17:37:18 ... logging to /home/buildfarm/.ros/log/rostest-e34615b7fbf5-23317.log
17:37:18 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
3
17:37:02 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
17:37:02 -- run_tests.py: execute commands
17:37:02 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
17:37:02 ... logging to /home/buildfarm/.ros/log/rostest-f9dd0c42ed32-23311.log
17:37:03 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
4
17:37:04 Scanning dependencies of target _run_tests_test_rospy_rostest_test_rostest_sub_to_multiple_pubs.test
17:37:04 -- run_tests.py: execute commands
17:37:04 /tmp/catkin_workspace/install_isolated/share/rostest/cmake/../../../bin/rostest --pkgdir=/tmp/catkin_workspace/src/ros_comm/test/test_rospy --package=test_rospy --results-filename test_rostest_sub_to_multiple_pubs.xml --results-base-dir /tmp/catkin_workspace/test_results /tmp/catkin_workspace/src/ros_comm/test/test_rospy/test/rostest/sub_to_multiple_pubs.test
17:37:05 ... logging to /home/buildfarm/.ros/log/rostest-20380c9e4479-23317.log
17:37:05 [ROSUNIT] Outputting test results to /tmp/catkin_workspace/test_results/test_rospy/rostest-test_rostest_sub_to_multiple_pubs.xml
Note I've only clipped the job names. There are several warnings before the slave goes offline of the type:
17:37:56 pub = rospy.Publisher('chatter', String)
17:37:56 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:37:56 pub = rospy.Publisher('chatter', String)
17:37:56 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:37:56 pub = rospy.Publisher('chatter', String)
17:37:57 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:37:57 pub = rospy.Publisher('chatter', String)
17:37:58 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:37:58 pub = rospy.Publisher('chatter', String)
17:37:59 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:37:59 pub = rospy.Publisher('chatter', String)
17:38:01 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:01 pub = rospy.Publisher('chatter', String)
17:38:01 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:01 pub = rospy.Publisher('chatter', String)
17:38:02 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:02 pub = rospy.Publisher('chatter', String)
17:38:02 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:02 pub = rospy.Publisher('chatter', String)
17:38:03 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:03 pub = rospy.Publisher('chatter', String)
17:38:03 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:03 pub = rospy.Publisher('chatter', String)
17:38:03 /tmp/catkin_workspace/src/ros_comm/test/test_rospy/nodes/talker.py:47: SyntaxWarning: The publisher should be created with an explicit keyword argument 'queue_size'. Please see http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers for more information.
17:38:03 pub = rospy.Publisher('chatter', String)
17:38:07 Slave went offline during the build
17:38:07 ERROR: Node is being removed
17:38:07 Build step 'Execute shell' marked build as failure
17:38:07 ERROR: Step ‘Scan for compiler warnings’ failed: no workspace for Kpr__ros_comm__ubuntu_xenial_amd64 #24
17:38:07 [xUnit] [INFO] - Starting to record.
17:38:07 [xUnit] [INFO] - Processing GoogleTest-1.6
17:38:07 ERROR: Build step failed with exception
17:38:07 java.lang.NullPointerException
17:38:07 at org.jenkinsci.plugins.xunit.XUnitProcessor.performTests(XUnitProcessor.java:145)
17:38:07 at org.jenkinsci.plugins.xunit.XUnitProcessor.performXUnit(XUnitProcessor.java:88)
17:38:07 at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:142)
17:38:07 at org.jenkinsci.plugins.xunit.XUnitPublisher.perform(XUnitPublisher.java:134)
17:38:07 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
17:38:07 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:782)
17:38:07 at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:723)
17:38:07 at hudson.model.Build$BuildExecution.post2(Build.java:185)
17:38:07 at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:668)
17:38:07 at hudson.model.Run.execute(Run.java:1763)
17:38:07 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
17:38:07 at hudson.model.ResourceController.execute(ResourceController.java:98)
17:38:07 at hudson.model.Executor.run(Executor.java:410)
17:38:07 Build step 'Publish xUnit test result report' marked build as failure
17:38:07 ERROR: Step ‘E-mail Notification’ failed: no workspace for Kpr__ros_comm__ubuntu_xenial_amd64 #24
17:38:07 Setting status of 42bd33cdcb20b3d843e263c62a99f63664c012a2 to FAILURE with url http://build.ros.org/job/Kpr__ros_comm__ubuntu_xenial_amd64/24/ and message: 'Build finished. No test results found.'
17:38:07 Using context: Kpr__ros_comm__ubuntu_xenial_amd64
17:38:08 Finished: FAILURE
From the OOM logs we have a lot of python processes using memory:
Below is the OOM traceback with just the memory usage isolated. Note that the units of total_vm is a count of 4k pages: https://unix.stackexchange.com/questions/128642/debug-out-of-memory-with-var-log-messages
Looking at the tests I see 384 python processes of which a large fraction appear to be test harnesses using a lot of ram each.
And clearly the docker instance and jenkins-slave are also using large chunks too.
$ cat total_vm.txt
total_vm name
4868 upstart-udev-br
12873 systemd-udevd
3814 upstart-socket-
2555 dhclient
489296 docker
3818 upstart-file-br
3634 getty
3634 getty
9803 dbus-daemon
3634 getty
3634 getty
3634 getty
15341 sshd
65018 rsyslogd
4784 atd
5913 cron
1091 acpid
10862 systemd-logind
4819 irqbalance
4241 nrsysmond
62933 nrsysmond
3634 getty
3196 getty
7861 ntpd
28230 python
7725 python
4686 daemon
1682681 java
1080 sh
93759 squid3
8402 log_file_daemon
8364 unlinkd
8431 pinger
1110 sh
8443 python3
1110 sh
8446 python3
1110 sh
8445 python3
56614 docker
56614 docker
1124 sh
9608 python3
1124 sh
9607 python3
60712 docker
1124 sh
9608 python3
1110 sh
8443 python3
58663 docker
1124 sh
9608 python3
1124 sh
10486 catkin_make_iso
1124 sh
10487 catkin_make_iso
1124 sh
10487 catkin_make_iso
1124 sh
10486 catkin_make_iso
2083 make
2083 make
2116 make
2116 make
2116 make
2116 make
2083 make
2116 make
2116 make
2083 make
2116 make
2116 make
2083 make
1128 sh
6381 python
1128 sh
190951 rostest
81747 rosout
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
2083 make
1128 sh
6381 python
1128 sh
190951 rostest
95714 python
95714 python
95714 python
2083 make
1128 sh
6381 python
1128 sh
190951 rostest
81747 rosout
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
81748 rosout
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
2083 make
1128 sh
6381 python
95714 python
1128 sh
190951 rostest
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
81784 rosout
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
97763 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
95714 python
21392 python
95714 python
21201 python
95714 python
95714 python
21392 python
95714 python
95714 python
95714 python
95714 python
21392 python
21201 python
95714 python
95714 python
21392 python
95714 python
95714 python
21392 python
21392 python
21392 python
77281 python
95714 python
21392 python
21201 python
95714 python
20485 python
21392 python
21392 python
21392 python
20547 python
21392 python
21392 python
21201 python
20547 python
21392 python
21201 python
21201 python
21201 python
20485 python
20483 python
20485 python
20485 python
20547 python
20420 python
19339 python
20485 python
19274 python
19338 python
19338 python
20483 python
19338 python
19274 python
19338 python
19338 python
6621 python
12110 python
19338 python
19338 python
17786 python
6014 python
7741 python
7833 python
17979 python
6014 python
5884 python
5393 python
6014 python
5981 python
5981 python
5787 python
5786 python
5885 python
5393 python
5884 python
5178 python
5786 python
5275 python
5275 python
5178 python
5275 python
5178 python
4968 python
5178 python
89 talker.py
120 sh
121 sh
121 sh
120 sh
120 sh
Since all the builds referenced in this ticket I don't expect that anyone can follow up on this specific case. Since there is #271 I will close this issue.
This was a manually configured node that had been running for a while:
Exerpt from jenkins log around that time.
The full slave log:
The slave has now recovered and is running just fine
The
Discovering Jenkins master
above gives no reason for the disconnect.