Closed nuclearsandwich closed 4 years ago
My first try was to take the machine out from the ROS buildfarm and connect it to the build.osrfoundation.org to see if we get more information from it. I've launched the agest directly through the command line using the java -jar agent.jar ...
invocation.
Unfortunately the errors persisted in the same way. I was unable to see in any log a signal of what could be the root cause of wrong.
Is this an example of the type of failure due to the server going offline?
https://ci.ros2.org/job/ci_osx/8271/console
15:31:50 Start 6: xmllint
15:31:50
15:31:50 6: Test command: /Users/osrf/jenkins-agent/workspace/ci_osx/venv/bin/python3 "-u" "/Users/osrf/jenkins-agent/workspace/ci_osx/ws/install/ament_cmake_test/share/ament_cmake_test/cmake/run_test.py" "/Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/rmw/test_results/rmw/xmllint.xunit.xml" "--package-name" "rmw" "--output-file" "/Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/rmw/ament_xmllint/xmllint.txt" "--command" "/Users/osrf/jenkins-agent/workspace/ci_osx/ws/install/ament_xmllint/bin/ament_xmllint" "--xunit-file" "/Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/rmw/test_results/rmw/xmllint.xunit.xml"
15:31:50 6: Test timeout computed to be: 60
15:31:50 6: -- run_test.py: invoking following command in '/Users/osrf/jenkins-agent/workspace/ci_osx/ws/src/ros2/rmw/rmw':
15:31:50 6: - /Users/osrf/jenkins-agent/workspace/ci_osx/ws/install/ament_xmllint/bin/ament_xmllint --xunit-file /Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/rmw/test_results/rmw/xmllint.xunit.xml
15:32:08 FATAL: command execution failed
15:32:08 java.nio.channels.ClosedChannelException
15:32:08 at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer.onReadClosed(ChannelApplicationLayer.java:209)
15:32:08 at org.jenkinsci.remoting.protocol.ApplicationLayer.onRecvClosed(ApplicationLayer.java:221)
15:32:08 at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.onRecvClosed(ProtocolStack.java:816)
15:32:08 at org.jenkinsci.remoting.protocol.FilterLayer.onRecvClosed(FilterLayer.java:287)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.onRecvClosed(SSLEngineFilterLayer.java:181)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.switchToNoSecure(SSLEngineFilterLayer.java:283)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processWrite(SSLEngineFilterLayer.java:503)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.processQueuedWrites(SSLEngineFilterLayer.java:248)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doSend(SSLEngineFilterLayer.java:200)
15:32:08 at org.jenkinsci.remoting.protocol.impl.SSLEngineFilterLayer.doCloseSend(SSLEngineFilterLayer.java:213)
15:32:08 at org.jenkinsci.remoting.protocol.ProtocolStack$Ptr.doCloseSend(ProtocolStack.java:784)
15:32:08 at org.jenkinsci.remoting.protocol.ApplicationLayer.doCloseWrite(ApplicationLayer.java:172)
15:32:08 at org.jenkinsci.remoting.protocol.impl.ChannelApplicationLayer$ByteBufferCommandTransport.closeWrite(ChannelApplicationLayer.java:314)
15:32:08 at hudson.remoting.Channel.close(Channel.java:1450)
15:32:08 at hudson.remoting.Channel.close(Channel.java:1403)
15:32:08 at hudson.slaves.SlaveComputer.closeChannel(SlaveComputer.java:824)
15:32:08 at hudson.slaves.SlaveComputer.access$100(SlaveComputer.java:107)
15:32:08 at hudson.slaves.SlaveComputer$2.run(SlaveComputer.java:733)
15:32:08 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
15:32:08 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59)
15:32:08 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
15:32:08 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
15:32:08 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
15:32:08 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
15:32:08 at java.lang.Thread.run(Thread.java:748)
15:32:08 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from 70-35-50-58.static.wiline.com/70.35.50.58:49667' is disconnected.
Here's another build that looks like wonky network stuff
https://ci.ros2.org/job/ci_osx/8347/consoleText
---
Finished <<< rosgraph_msgs [1min 13s]
]0;colcon build [151/293 done] [3 ongoing]Starting >>> std_msgs
]0;colcon build [151/293 done] [4 ongoing]--- output: zstd_vendor
Not searching for unused variables given on the command line.
-- The C compiler identification is AppleClang 9.0.0.9000039
-- The CXX compiler identification is AppleClang 9.0.0.9000039
-- Check for working C compiler: /usr/local/opt/ccache/libexec/cc
-- Check for working C compiler: /usr/local/opt/ccache/libexec/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/local/opt/ccache/libexec/c++
-- Check for working CXX compiler: /usr/local/opt/ccache/libexec/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ament_cmake: 0.8.1 (/Users/osrf/jenkins-agent/workspace/ci_osx/ws/install/ament_cmake/share/ament_cmake/cmake)
-- Found PythonInterp: /Users/osrf/jenkins-agent/workspace/ci_osx/venv/bin/python3 (found suitable version "3.7.6", minimum required is "3")
-- Using PYTHON_EXECUTABLE: /Users/osrf/jenkins-agent/workspace/ci_osx/venv/bin/python3
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/zstd_vendor
Scanning dependencies of target zstd-1.4.4
[ 12%] Creating directories for 'zstd-1.4.4'
[ 25%] Performing download step (download, verify and extract) for 'zstd-1.4.4'
-- Downloading...
dst='/Users/osrf/jenkins-agent/workspace/ci_osx/ws/build/zstd_vendor/zstd-1.4.4-prefix/src/v1.4.4.zip'
timeout='60 seconds'
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 100% complete]
-- Retrying...
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 100% complete]
-- Retry after 5 seconds (attempt #2) ...
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 100% complete]
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- Retry after 5 seconds (attempt #3) ...
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 100% complete]
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- [download 10% complete]
-- [download 11% complete]
-- [download 12% complete]
-- [download 13% complete]
-- Retry after 15 seconds (attempt #4) ...
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- [download 10% complete]
-- Retry after 60 seconds (attempt #5) ...
-- Using src='https://github.com/facebook/zstd/archive/v1.4.4.zip'
-- [download 100% complete]
-- [download 0% complete]
-- [download 1% complete]
-- [download 2% complete]
-- [download 3% complete]
-- [download 4% complete]
-- [download 5% complete]
-- [download 6% complete]
-- [download 7% complete]
-- [download 8% complete]
-- [download 9% complete]
-- [download 10% complete]
CMake Error at zstd-1.4.4-stamp/download-zstd-1.4.4.cmake:159 (message):
Each download failed!
error: downloading 'https://github.com/facebook/zstd/archive/v1.4.4.zip' failed
status_code: 28
status_string: "Timeout was reached"
log:
--- LOG BEGIN ---
Trying 192.30.255.112...
TCP_NODELAY set
Connected to github.com (192.30.255.112) port 443 (#0)
ALPN, offering http/1.1
Cipher selection:
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
successfully set certificate verify locations:
CAfile: /etc/ssl/cert.pem
CApath: none
TLSv1.2 (OUT), TLS handshake, Client hello (1):
[213 bytes data]
TLSv1.2 (IN), TLS handshake, Server hello (2):
[108 bytes data]
TLSv1.2 (IN), TLS handshake, Certificate (11):
[3085 bytes data]
TLSv1.2 (IN), TLS handshake, Server key exchange (12):
[300 bytes data]
TLSv1.2 (IN), TLS handshake, Server finished (14):
[4 bytes data]
TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
[37 bytes data]
TLSv1.2 (OUT), TLS change cipher, Client hello (1):
[1 bytes data]
TLSv1.2 (OUT), TLS handshake, Finished (20):
[16 bytes data]
TLSv1.2 (IN), TLS change cipher, Client hello (1):
[1 bytes data]
TLSv1.2 (IN), TLS handshake, Finished (20):
[16 bytes data]
SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
ALPN, server accepted to use http/1.1
Server certificate:
subject: businessCategory=Private Organization; jurisdictionCountryName=US; jurisdictionStateOrProvinceName=Delaware; serialNumber=5157550; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
start date: May 8 00:00:00 2018 GMT
expire date: Jun 3 12:00:00 2020 GMT
subjectAltName: host "github.com" matched cert's "github.com"
issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
SSL certificate verify ok.
GET /facebook/zstd/archive/v1.4.4.zip HTTP/1.1
Host: github.com
User-Agent: curl/7.51.0
Accept: */*
HTTP/1.1 302 Found
date: Fri, 17 Apr 2020 00:22:58 GMT
content-type: text/html; charset=utf-8
server: GitHub.com
status: 302 Found
vary: X-PJAX, Accept-Encoding, Accept, X-Requested-With
location: https://codeload.github.com/facebook/zstd/zip/v1.4.4
cache-control: max-age=0, private
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
expect-ct: max-age=2592000,
report-uri="https://api.github.com/_private/browser/errors"
content-security-policy: default-src 'none'; base-uri 'self';
block-all-mixed-content; connect-src 'self' uploads.github.com
www.githubstatus.com collector.githubapp.com api.github.com
www.google-analytics.com github-cloud.s3.amazonaws.com
github-production-repository-file-5c1aeb.s3.amazonaws.com
github-production-upload-manifest-file-7fdce7.s3.amazonaws.com
github-production-user-asset-6210df.s3.amazonaws.com cdn.optimizely.com
logx.optimizely.com/v1/events wss://live.github.com; font-src
github.githubassets.com; form-action 'self' github.com gist.github.com;
frame-ancestors 'none'; frame-src render.githubusercontent.com; img-src
'self' data: github.githubassets.com identicons.github.com
collector.githubapp.com github-cloud.s3.amazonaws.com
*.githubusercontent.com; manifest-src 'self'; media-src 'none'; script-src
github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com
Content-Length: 118
X-GitHub-Request-Id: D5E6:9ADE:AAC19:E88A4:5E98F6E1
Ignoring the response-body
[118 bytes data]
Connection #0 to host github.com left intact
Issue another request to this URL:
'https://codeload.github.com/facebook/zstd/zip/v1.4.4'
Trying 192.30.255.121...
TCP_NODELAY set
Connected to codeload.github.com (192.30.255.121) port 443 (#1)
ALPN, offering http/1.1
Cipher selection:
ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
successfully set certificate verify locations:
CAfile: /etc/ssl/cert.pem
CApath: none
TLSv1.2 (OUT), TLS handshake, Client hello (1):
[222 bytes data]
TLSv1.2 (IN), TLS handshake, Server hello (2):
[108 bytes data]
TLSv1.2 (IN), TLS handshake, Certificate (11):
[2851 bytes data]
TLSv1.2 (IN), TLS handshake, Server key exchange (12):
[300 bytes data]
TLSv1.2 (IN), TLS handshake, Server finished (14):
[4 bytes data]
TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
[37 bytes data]
TLSv1.2 (OUT), TLS change cipher, Client hello (1):
[1 bytes data]
TLSv1.2 (OUT), TLS handshake, Finished (20):
[16 bytes data]
TLSv1.2 (IN), TLS change cipher, Client hello (1):
[1 bytes data]
TLSv1.2 (IN), TLS handshake, Finished (20):
[16 bytes data]
SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
ALPN, server accepted to use http/1.1
Server certificate:
subject: C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=*.github.com
start date: Jul 8 00:00:00 2019 GMT
expire date: Jul 16 12:00:00 2020 GMT
subjectAltName: host "codeload.github.com" matched cert's "*.github.com"
issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
SSL certificate verify ok.
GET /facebook/zstd/zip/v1.4.4 HTTP/1.1
Host: codeload.github.com
User-Agent: curl/7.51.0
Accept: */*
HTTP/1.1 200 OK
Access-Control-Allow-Origin: https://render.githubusercontent.com
Content-Disposition: attachment; filename=zstd-1.4.4.zip
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline';
sandbox
Content-Type: application/zip
ETag: W/"c0404a1b438a4549e83b7323dadd897d3cf234e6fe6eb9101c5fcdb277420dc7"
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
Date: Fri, 17 Apr 2020 00:23:00 GMT
X-Varnish: 864982496
Age: 0
Via: 1.1 varnish (Varnish/6.0)
X-Cache: MISS
X-Cache-Hits: 0
Accept-Ranges: bytes
Transfer-Encoding: chunked
X-GitHub-Request-Id: D5E7:1BBA:1162E:2F991:5E98F6E2
[633 bytes data]
[1370 bytes data]
[54 bytes data]
[1370 bytes data]
[1370 bytes data]
[1370 bytes data]
...
[1370 bytes data]
[1370 bytes data]
[1370 bytes data]
[1370 bytes data]
[1370 bytes data]
Operation timed out after 57526 milliseconds with 229741 out of 2234160
bytes received
stopped the pause stream!
Closing connection 1
TLSv1.2 (OUT), TLS alert, Client hello (1):
[2 bytes data]
--- LOG END ---
make[2]: *** [zstd-1.4.4-prefix/src/zstd-1.4.4-stamp/zstd-1.4.4-download] Error 1
make[1]: *** [CMakeFiles/zstd-1.4.4.dir/all] Error 2
make: *** [all] Error 2
---
Failed <<< zstd_vendor [ Exited with code 2 ]
]0;colcon build [152/293 done] [3 ongoing]Aborted <<< action_msgs
]0;colcon build [153/293 done] [2 ongoing]Aborted <<< std_msgs
]0;colcon build [154/293 done] [1 ongoing]Aborted <<< rcl_interfaces
]0;colcon build [155/293 done] [0 ongoing]
Summary: 151 packages finished [11min 39s]
1 package failed: zstd_vendor
3 packages aborted: action_msgs rcl_interfaces std_msgs
7 packages had stderr output: foonathan_memory_vendor qt_gui_cpp rcl_logging_spdlog rviz_rendering rviz_rendering_tests tracetools zstd_vendor
138 packages not processed
[31m[1m<==[0m '. ../venv/bin/activate && . "/Applications/rti_connext_dds-5.3.1/resource/scripts/rtisetenv_x64Darwin16clang8.0.bash" && /Users/osrf/jenkins-agent/workspace/ci_osx/venv/bin/colcon build --base-paths "src" --build-base "build" --install-base "install" --event-handlers console_cohesion+ console_package_list+ --cmake-args -DBUILD_TESTING=ON --no-warn-unused-cli -DINSTALL_EXAMPLES=OFF -DSECURITY=ON' exited with return code '2'[0m
Build step 'Execute shell' marked build as failure
Recent activity hasn't included any of the problems identified in this issue and I think it can be closed.
During the week of ROSCon, we wiped the CI nodes mini1 and mini2 and installed macOS Mojave. Since the reinstall neither machine has reliabily stayed on the CI cluster.
Oct 31. Mini1 experienced up to 5% packet loss compared to my workstation on the other end of the office which I thought could be related to the connection reliability but since then I've relocated mini1 and the packet loss is down but the disconnects continue.
Since then I've tried
Before each connection failure in the agent logs is a
java.lang.NoClassDefFoundError
for eitherhudson/util/ProcessTree
orjenkins/util/java/JavaUtils
but browsing Jenkins issues hasn't yielded paydirt yet.