skupperproject / skupper

Skupper is an implementation of a Virtual Application Network, enabling rich hybrid cloud communication.
http://skupper.io
Apache License 2.0
582 stars 72 forks source link

Skupper on ARM keeps restarting #1279

Open michaelalang opened 10 months ago

michaelalang commented 10 months ago

Skupper 1.5.0 running on a raspberry PI 4 keep restarting with following logs

2023-11-13 16:54:38.811339 +0000 SERVER (error) [C2884] Connection from ::1:53398 (to localhost:5672) failed: amqp:connection:framing-error connection aborted
2023-11-13 16:54:38.812755 +0000 SERVER (error) [C2891] Connection from ::1:53470 (to localhost:5672) failed: amqp:connection:framing-error connection aborted
2023-11-13 16:54:38.814129 +0000 SERVER (error) [C2883] Connection from ::1:53396 (to localhost:5672) failed: amqp:connection:framing-error connection aborted

and

2023-11-13 16:55:32.057049 +0000 SERVER (error) [C2880] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:32.227544 +0000 FLOW_LOG (info) LOG [8bkk6:2769] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2880] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084
2023-11-13 16:55:34.092963 +0000 SERVER (error) [C2881] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:34.235120 +0000 FLOW_LOG (info) LOG [8bkk6:2770] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2881] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084
2023-11-13 16:55:38.227183 +0000 SERVER (error) [C2882] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671
2023-11-13 16:55:38.246242 +0000 FLOW_LOG (info) LOG [8bkk6:2771] BEGIN END parent=8bkk6:0 logSeverity=3 logText=LOG_SERVER: [C2882] Connection to 10.246.1.72:55671 failed: proton:io Connection timed out - disconnected 10.246.1.72:55671 sourceFile=/build/src/server.c sourceLine=1084

the deployment is working for ~1-4 minutes and even shows remotes and exposed services as well as access to various services accordingly. After that period of time, it seems that SSL get's out-of-sync (maybe due to hardware limitation?) and the pods get restarted, and the same behavior is reproduced (works for 1-4min than doesn't work)

I understand we do not support skupper on ARM in that relation at the moment, still I want to make everyone aware of the possible issue we might face with ARM based deployments.

michaelalang commented 10 months ago

here's another error I was able to capture in the service-controller/flow-collector pod

[Beacon detector module starting]
[API module starting]
API server listening on port 8010
Connection to the VAN is open
New ROUTER detected: zhg6s:0
New ROUTER detected: hbpjt:0
New ROUTER detected: qxdth:0
New CONTROLLER detected: cfa7a05c-d9bc-464c-a485-819add8f4a76
Sending FLUSH to sfe.zhg6s:0
Sending FLUSH to sfe.hbpjt:0
New CONTROLLER detected: 6e9774e6-02ff-42c6-8f85-9a63d0734605
New CONTROLLER detected: cb1c35c7-8d48-4eed-9b25-2d07f1ec15b3
New ROUTER detected: qgnsz:0
Sending FLUSH to sfe.qxdth:0
New CONTROLLER detected: 62737c3d-13d4-4c09-82bf-449625b5eeaf
New CONTROLLER detected: af10fd96-bce9-4fb7-8585-87f60810ff9e
New ROUTER detected: rg2dg:0
New ROUTER detected: 8bkk6:0
Sending FLUSH to sfe.cfa7a05c-d9bc-464c-a485-819add8f4a76
events.js:174
      throw er; // Unhandled 'error' event
      ^

TypeError: Cannot read property 'push' of undefined
    at new Record (/usr/src/src/data.js:122:32)
    at Object.exports.IncomingRecord (/usr/src/src/data.js:503:23)
    at recordList.forEach.item (/usr/src/src/network.js:123:18)
    at Array.forEach (<anonymous>)
    at Container.<anonymous> (/usr/src/src/network.js:121:20)
    at Container.emit (events.js:198:13)
    at Container.dispatch (/usr/src/node_modules/rhea/lib/container.js:41:33)
    at Connection.dispatch (/usr/src/node_modules/rhea/lib/connection.js:261:40)
    at Session.dispatch (/usr/src/node_modules/rhea/lib/session.js:456:41)
    at Receiver.link.dispatch (/usr/src/node_modules/rhea/lib/link.js:62:38)
Emitted 'error' event at:
    at Container.dispatch (/usr/src/node_modules/rhea/lib/container.js:41:33)
    at Connection.dispatch (/usr/src/node_modules/rhea/lib/connection.js:261:40)
    at Connection.input (/usr/src/node_modules/rhea/lib/connection.js:574:18)
    at TLSSocket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:288:12)
    at readableAddChunk (_stream_readable.js:269:11)
    at TLSSocket.Readable.push (_stream_readable.js:224:10)
    at TLSWrap.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)
grs commented 10 months ago

What image is that log from? (It is a node.js based image which is not the standard flow controller).

michaelalang commented 10 months ago

@grs it's based on https://github.com/skupperproject/skupper/blob/main/Dockerfile.flow-collector

grs commented 10 months ago

@grs it's based on https://github.com/skupperproject/skupper/blob/main/Dockerfile.flow-collector

I don't think it can be as that is a go based collector and the trace is clearly from a nodejs based program.

ted-ross commented 10 months ago

For the record, that backtrace is from the prototype collector (nodejs). Can you run skupper version in that environment to see what images are being used?

michaelalang commented 10 months ago

Hi Ted,

I picked the dockerfiles from the repo ... :?

$ skupper -c pi4 -n skupper version
client version                 1.4.1
transport version              quay.example.com/skupper/skupper-router:2.5.0 (sha256:51f8ab009232)
controller version             not-found
config-sync version            quay.example.com/skupper/config-sync:1.5.0 (sha256:e60cfee4c09a)
flow-collector version         not-found

$ oc --context pi4 -n skupper exec -ti deploy/skupper-service-controller -- ./service-controller -version
1.5.0

[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skrouterd -v
0.0.0
[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skmanage --version
0.0.0
[runner@skupper-router-ffb9458b9-nvnt8 bin]$ skstat --version
0.0.0

[root@pi4 skupper-router]# git config remote.origin.url
https://github.com/skupperproject/skupper-router
[root@pi4 skupper-router]# git branch
* main
# Containerfile used for build

[root@pi4 skupper]# git config remote.origin.url
https://github.com/skupperproject/skupper.git
[root@pi4 skupper]# git branch
* (HEAD detached at 1.5.0)
  main

# Dockerfile.ci-test  Dockerfile.config-sync  Dockerfile.controller-podman  Dockerfile.flow-collector  Dockerfile.service-controller  Dockerfile.site-controller used for build