synadia-io / nex

The NATS execution engine
https://docs.nats.io/using-nats/nex
Apache License 2.0
182 stars 14 forks source link

msg="Did not receive NATS handshake from agent within timeout." #142

Closed gain620 closed 5 months ago

gain620 commented 5 months ago

Observed behavior

When using the latest nex cli with arm64 support update, there seems to be a problem when NATS tries to hand shake agents within microVMs.

admin@ip-x-x-x-x:~$ nats-server -js

[11826] 2024/03/18 07:15:11.309007 [INF] Starting nats-server
[11826] 2024/03/18 07:15:11.309159 [INF]   Version:  2.10.12
[11826] 2024/03/18 07:15:11.309179 [INF]   Git:      [121169ea]
[11826] 2024/03/18 07:15:11.309199 [INF]   Name:     NCXQ6Y3BGPS4XJ4AXQWNTUO7SUYN65FH3UJPUSLNPFX4LE6LSAWRQSNN
[11826] 2024/03/18 07:15:11.309220 [INF]   Node:     8nbUCYL1
[11826] 2024/03/18 07:15:11.309238 [INF]   ID:       NCXQ6Y3BGPS4XJ4AXQWNTUO7SUYN65FH3UJPUSLNPFX4LE6LSAWRQSNN
[11826] 2024/03/18 07:15:11.309683 [INF] Starting JetStream
[11826] 2024/03/18 07:15:11.310617 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[11826] 2024/03/18 07:15:11.310643 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[11826] 2024/03/18 07:15:11.310661 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[11826] 2024/03/18 07:15:11.310679 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[11826] 2024/03/18 07:15:11.310695 [INF]
[11826] 2024/03/18 07:15:11.310712 [INF]          https://docs.nats.io/jetstream
[11826] 2024/03/18 07:15:11.310729 [INF]
[11826] 2024/03/18 07:15:11.310746 [INF] ---------------- JETSTREAM ----------------
[11826] 2024/03/18 07:15:11.310766 [INF]   Max Memory:      23.47 GB
[11826] 2024/03/18 07:15:11.310788 [INF]   Max Storage:     18.52 GB
[11826] 2024/03/18 07:15:11.310807 [INF]   Store Directory: "/tmp/nats/jetstream"
[11826] 2024/03/18 07:15:11.310825 [INF] -------------------------------------------
[11826] 2024/03/18 07:15:11.311415 [INF] Listening for client connections on 0.0.0.0:4222
[11826] 2024/03/18 07:15:11.312698 [INF] Server is ready
admin@ip-x-x-x-x:~$ sudo nex node preflight

Validating - Required CNI Plugins
          🔎Searching - /opt/cni/bin
          ✅ Dependency Satisfied - /opt/cni/bin/host-local [host-local CNI plugin]
          ✅ Dependency Satisfied - /opt/cni/bin/ptp [ptp CNI plugin]
          ✅ Dependency Satisfied - /opt/cni/bin/tc-redirect-tap [tc-redirect-tap CNI plugin]

Validating - Required binaries
          🔎Searching - /usr/local/bin
          ✅ Dependency Satisfied - /usr/local/bin/firecracker [Firecracker VM binary]

Validating - CNI configuration requirements
          🔎Searching - /etc/cni/conf.d
          ✅ Dependency Satisfied - /etc/cni/conf.d/fcnet.conflist [CNI Configuration]

Validating - VMLinux Kernel
          ✅ Dependency Satisfied - /tmp/wd/vmlinux [VMLinux Kernel]

Validating - Root Filesystem Template
          ✅ Dependency Satisfied - /tmp/wd/rootfs.ext4 [Root Filesystem Template]
admin@ip-x-x-x-x:~$ sudo nex node up

time=2024-03-18T07:17:28.694Z level=INFO msg="Generated keypair for node" public_key=NAJ25OZ52JHNPU43HPCYAZW75MJ3US3ORABO3JYYIRLK4RBTPFI5AWKH
time=2024-03-18T07:17:28.695Z level=INFO msg="Loaded node configuration" config_path=./config.json
time=2024-03-18T07:17:28.695Z level=INFO msg="Initialized telemetry"
time=2024-03-18T07:17:28.696Z level=INFO msg="Established node NATS connection" servers=""
time=2024-03-18T07:17:28.704Z level=INFO msg="Internal NATS server started" client_url=nats://0.0.0.0:44253
time=2024-03-18T07:17:28.705Z level=INFO msg="Virtual machine manager starting"
time=2024-03-18T07:17:28.705Z level=INFO msg="Resetting network"
time=2024-03-18T07:17:28.705Z level=INFO msg="Use this key as the recipient for encrypted run requests" public_xkey=XDIZCIFCIDEGBDR23JGIEKBX43RVXNC3MOJGTU2Q3VXNIQ3ZSH7HT4VC
time=2024-03-18T07:17:28.705Z level=INFO msg="NATS execution engine awaiting commands" id=NAJ25OZ52JHNPU43HPCYAZW75MJ3US3ORABO3JYYIRLK4RBTPFI5AWKH version=0.1.5
time=2024-03-18T07:17:28.707Z level=INFO msg="Publishing node started event"
time=2024-03-18T07:17:29.014Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g socket_path=/tmp/.firecracker.sock-11865-cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:29.026Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:29.026Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrul21t6e8i4vc1sf6g err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0x40002df748 Smt:0x40002df753 TrackDirtyPages:0x40002df754 VcpuCount:0x40002df740}"
time=2024-03-18T07:17:29.026Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrul21t6e8i4vc1sf6g err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:17:29.026Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g drive_path=/tmp/rootfs-cnrul21t6e8i4vc1sf6g.ext4 slot=1 root=true
time=2024-03-18T07:17:29.027Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g drive_path=/tmp/rootfs-cnrul21t6e8i4vc1sf6g.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:17:29.027Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g device_name=tap0 mac_addr=ca:7d:05:49:f0:b2 interface_id=1
time=2024-03-18T07:17:29.050Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:17:29.050Z level=INFO msg="Machine started" vmid=cnrul21t6e8i4vc1sf6g ip=192.168.127.2 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=44253
time=2024-03-18T07:17:29.051Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:29.051Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.2 vmid=cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:29.358Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrul29t6e8i4vc1sf70 socket_path=/tmp/.firecracker.sock-11865-cnrul29t6e8i4vc1sf70
time=2024-03-18T07:17:29.370Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrul29t6e8i4vc1sf70
time=2024-03-18T07:17:29.371Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrul29t6e8i4vc1sf70 err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0x40004d9398 Smt:0x40004d93a3 TrackDirtyPages:0x40004d93a4 VcpuCount:0x40004d9390}"
time=2024-03-18T07:17:29.371Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrul29t6e8i4vc1sf70 err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:17:29.371Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrul29t6e8i4vc1sf70 drive_path=/tmp/rootfs-cnrul29t6e8i4vc1sf70.ext4 slot=1 root=true
time=2024-03-18T07:17:29.372Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrul29t6e8i4vc1sf70 drive_path=/tmp/rootfs-cnrul29t6e8i4vc1sf70.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:17:29.372Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrul29t6e8i4vc1sf70 device_name=tap0 mac_addr=4a:a8:e5:66:f8:a2 interface_id=1
time=2024-03-18T07:17:29.396Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrul29t6e8i4vc1sf70 err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:17:29.397Z level=INFO msg="Machine started" vmid=cnrul29t6e8i4vc1sf70 ip=192.168.127.3 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=44253
time=2024-03-18T07:17:29.397Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrul29t6e8i4vc1sf70
time=2024-03-18T07:17:29.397Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.3 vmid=cnrul29t6e8i4vc1sf70
time=2024-03-18T07:17:29.701Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g socket_path=/tmp/.firecracker.sock-11865-cnrul29t6e8i4vc1sf7g
time=2024-03-18T07:17:29.713Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrul29t6e8i4vc1sf7g
time=2024-03-18T07:17:29.714Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrul29t6e8i4vc1sf7g err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0x4000408bb8 Smt:0x4000408bc3 TrackDirtyPages:0x4000408bc4 VcpuCount:0x4000408bb0}"
time=2024-03-18T07:17:29.714Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrul29t6e8i4vc1sf7g err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:17:29.715Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g drive_path=/tmp/rootfs-cnrul29t6e8i4vc1sf7g.ext4 slot=1 root=true
time=2024-03-18T07:17:29.715Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g drive_path=/tmp/rootfs-cnrul29t6e8i4vc1sf7g.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:17:29.715Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g device_name=tap0 mac_addr=36:88:7a:63:c5:62 interface_id=1
time=2024-03-18T07:17:29.740Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:17:29.740Z level=INFO msg="Machine started" vmid=cnrul29t6e8i4vc1sf7g ip=192.168.127.4 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=44253
time=2024-03-18T07:17:29.741Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrul29t6e8i4vc1sf7g
time=2024-03-18T07:17:29.741Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.4 vmid=cnrul29t6e8i4vc1sf7g
time=2024-03-18T07:17:34.074Z level=ERROR msg="Did not receive NATS handshake from agent within timeout." vmid=cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:34.074Z level=ERROR msg="First handshake failed, shutting down to avoid inconsistent behavior"

Problem:

time=2024-03-18T07:17:34.074Z level=ERROR msg="Did not receive NATS handshake from agent within timeout." vmid=cnrul21t6e8i4vc1sf6g
time=2024-03-18T07:17:34.074Z level=ERROR msg="First handshake failed, shutting down to avoid inconsistent behavior"

Expected behavior

I tested on a linux-amd64(x86_64) machine following the same steps and successfully set up the node.

admin@ip-x-x-x-x:~$ sudo nex node up
time=2024-03-18T07:32:19.250Z level=INFO msg="Generated keypair for node" public_key=NAAFGJURU5O2A7Y3LW4ADW46C3FGO2HQFQXR4ZDVUW4QOPCL5DSZ65D7
time=2024-03-18T07:32:19.251Z level=INFO msg="Loaded node configuration" config_path=./config.json
time=2024-03-18T07:32:19.251Z level=INFO msg="Initialized telemetry"
time=2024-03-18T07:32:19.252Z level=INFO msg="Established node NATS connection" servers=""
time=2024-03-18T07:32:19.462Z level=INFO msg="Internal NATS server started" client_url=nats://0.0.0.0:33653
time=2024-03-18T07:32:19.462Z level=INFO msg="Use this key as the recipient for encrypted run requests" public_xkey=XDSYXBPAWDSXFRNJWNLNCE3IGU2ZRCYZAMRDJETPVR5YNICZVBNQ5NDX
time=2024-03-18T07:32:19.462Z level=INFO msg="Virtual machine manager starting"
time=2024-03-18T07:32:19.462Z level=INFO msg="NATS execution engine awaiting commands" id=NAAFGJURU5O2A7Y3LW4ADW46C3FGO2HQFQXR4ZDVUW4QOPCL5DSZ65D7 version=0.1.5
time=2024-03-18T07:32:19.462Z level=INFO msg="Resetting network"
time=2024-03-18T07:32:19.462Z level=INFO msg="Publishing node started event"
time=2024-03-18T07:32:20.311Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrus0v76f30gvd23590 socket_path=/tmp/.firecracker.sock-1113-cnrus0v76f30gvd23590
time=2024-03-18T07:32:20.358Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrus0v76f30gvd23590
time=2024-03-18T07:32:20.359Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrus0v76f30gvd23590 err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0xc00040cb38 Smt:0xc00040cb43 TrackDirtyPages:0xc00040cb44 VcpuCount:0xc00040cb30}"
time=2024-03-18T07:32:20.359Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrus0v76f30gvd23590 err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:32:20.359Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrus0v76f30gvd23590 drive_path=/tmp/rootfs-cnrus0v76f30gvd23590.ext4 slot=1 root=true
time=2024-03-18T07:32:20.360Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrus0v76f30gvd23590 drive_path=/tmp/rootfs-cnrus0v76f30gvd23590.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:32:20.360Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrus0v76f30gvd23590 device_name=tap0 mac_addr=16:ee:e2:d1:5f:37 interface_id=1
time=2024-03-18T07:32:20.432Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrus0v76f30gvd23590 err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:32:20.432Z level=INFO msg="Machine started" vmid=cnrus0v76f30gvd23590 ip=192.168.127.2 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=33653
time=2024-03-18T07:32:20.433Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrus0v76f30gvd23590
time=2024-03-18T07:32:20.433Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.2 vmid=cnrus0v76f30gvd23590
time=2024-03-18T07:32:20.678Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrus1776f30gvd2359g socket_path=/tmp/.firecracker.sock-1113-cnrus1776f30gvd2359g
time=2024-03-18T07:32:20.690Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrus1776f30gvd2359g
time=2024-03-18T07:32:20.690Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrus1776f30gvd2359g err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0xc000015068 Smt:0xc000015073 TrackDirtyPages:0xc000015074 VcpuCount:0xc000015060}"
time=2024-03-18T07:32:20.691Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrus1776f30gvd2359g err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:32:20.691Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrus1776f30gvd2359g drive_path=/tmp/rootfs-cnrus1776f30gvd2359g.ext4 slot=1 root=true
time=2024-03-18T07:32:20.691Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrus1776f30gvd2359g drive_path=/tmp/rootfs-cnrus1776f30gvd2359g.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:32:20.691Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrus1776f30gvd2359g device_name=tap0 mac_addr=26:48:5a:18:8f:de interface_id=1
time=2024-03-18T07:32:20.765Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrus1776f30gvd2359g err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:32:20.765Z level=INFO msg="Machine started" vmid=cnrus1776f30gvd2359g ip=192.168.127.3 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=33653
time=2024-03-18T07:32:20.765Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrus1776f30gvd2359g
time=2024-03-18T07:32:20.765Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.3 vmid=cnrus1776f30gvd2359g
time=2024-03-18T07:32:21.018Z level=INFO msg="Called startVMM(), setting up a VMM" firecracker=true vmmid=cnrus1776f30gvd235a0 socket_path=/tmp/.firecracker.sock-1113-cnrus1776f30gvd235a0
time=2024-03-18T07:32:21.030Z level=INFO msg="VMM metrics disabled." firecracker=true vmmid=cnrus1776f30gvd235a0
time=2024-03-18T07:32:21.031Z level=INFO msg=refreshMachineConfiguration firecracker=true vmmid=cnrus1776f30gvd235a0 err="[GET /machine-config][200] getMachineConfigurationOK  &{CPUTemplate: MemSizeMib:0xc0001222c8 Smt:0xc0001222d3 TrackDirtyPages:0xc0001222d4 VcpuCount:0xc0001222c0}"
time=2024-03-18T07:32:21.031Z level=INFO msg=PutGuestBootSource firecracker=true vmmid=cnrus1776f30gvd235a0 err="[PUT /boot-source][204] putGuestBootSourceNoContent "
time=2024-03-18T07:32:21.031Z level=INFO msg="Attaching drive" firecracker=true vmmid=cnrus1776f30gvd235a0 drive_path=/tmp/rootfs-cnrus1776f30gvd235a0.ext4 slot=1 root=true
time=2024-03-18T07:32:21.032Z level=INFO msg="Attached drive" firecracker=true vmmid=cnrus1776f30gvd235a0 drive_path=/tmp/rootfs-cnrus1776f30gvd235a0.ext4 err="[PUT /drives/{drive_id}][204] putGuestDriveByIdNoContent "
time=2024-03-18T07:32:21.032Z level=INFO msg="Attaching NIC at index" firecracker=true vmmid=cnrus1776f30gvd235a0 device_name=tap0 mac_addr=76:a8:99:c5:0c:e6 interface_id=1
time=2024-03-18T07:32:21.095Z level=INFO msg="startInstance successful" firecracker=true vmmid=cnrus1776f30gvd235a0 err="[PUT /actions][204] createSyncActionNoContent "
time=2024-03-18T07:32:21.096Z level=INFO msg="Machine started" vmid=cnrus1776f30gvd235a0 ip=192.168.127.4 gateway=192.168.127.1 netmask=ffffff00 hosttap=tap0 nats_host=192.168.127.1 nats_port=33653
time=2024-03-18T07:32:21.096Z level=INFO msg="SetMetadata successful" firecracker=true vmmid=cnrus1776f30gvd235a0
time=2024-03-18T07:32:21.096Z level=INFO msg="Adding new VM to warm pool" ip=192.168.127.4 vmid=cnrus1776f30gvd235a0
time=2024-03-18T07:32:22.195Z level=INFO msg="Received agent handshake" vmid=cnrus0v76f30gvd23590 message="Host-supplied metadata"
time=2024-03-18T07:32:22.554Z level=INFO msg="Received agent handshake" vmid=cnrus1776f30gvd2359g message="Host-supplied metadata"
time=2024-03-18T07:32:22.857Z level=INFO msg="Received agent handshake" vmid=cnrus1776f30gvd235a0 message="Host-supplied metadata"

Nex and NATS version

admin@ip-x-x-x-x:~$ nats-server --version
nats-server: v2.10.12
admin@ip-x-x-x-x:~$ nex --version
v0.1.5 [567f45b355e71d0a186961a8aaca95d7c3a7fd3b] | Built-on: 2024-03-14T19:45:23Z

Host environment

Host: AWS a1.metal vCPUs: 16 Memory: 32GiB OS: debian-12 CPU Arch: arm64

admin@ip-x-x-x-x:~$ lsb_release -a

No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm
admin@ip-x-x-x-x:~$ lscpu

Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 16
  On-line CPU(s) list:  0-15
Vendor ID:              ARM
  Model name:           Cortex-A72
    Model:              3
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          4
    Stepping:           r0p3
    BogoMIPS:           166.66
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
Caches (sum of all):
  L1d:                  512 KiB (16 instances)
  L1i:                  768 KiB (16 instances)
  L2:                   8 MiB (4 instances)
NUMA:
  NUMA node(s):         1
  NUMA node0 CPU(s):    0-15
Vulnerabilities:
  Gather data sampling: Not affected
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Mmio stale data:      Not affected
  Retbleed:             Not affected
  Spec rstack overflow: Not affected
  Spec store bypass:    Not affected
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Mitigation; Branch predictor hardening, BHB
  Srbds:                Not affected
  Tsx async abort:      Not affected

Steps to reproduce

No response

NoahHahm commented 5 months ago

hi. @jordan-rash

139 related issue

jordan-rash commented 5 months ago

Sorry, we are at kubecon so this didn’t get documented very clearly (on the todo list), but you will need to use a different rootfs. I have added one to the 0.1.5 release artifacts and next week will include the code needed to build on your own

If you run main, preflight has been updated to pull the correct rootfs based on arch.

Disclaimer: due to some of our dependencies, the rootfs is much larger (Debian based). Will be looking to optimize this in coming weeks, but wanted to get the feature out there for you all.

Let me know if this doesn’t sort your problem

gain620 commented 5 months ago

@jordan-rash Using the different rootfs for arm64, I have successfully resolved the hand shaking issue.

Thank you, in the middle of your busy schedule with KubeCon!

jordan-rash commented 5 months ago

AWESOME! Look forward to hearing about your experiments. After KubeCon you will see the code on how that rootfs is built, I just forgot to push it

gain620 commented 5 months ago

@jordan-rash Hello! I am trying to build a custom rootfs image based on fc-image/scripts.go for my arm64(aarch64) system. Unfortunately when i boot up my nex node, the agent handshake keeps failing.

Not sure, but I think I am missing some dependencies required for the arm64 environment. I am wondering if the rootfs build script in the fc-image directory is applicable for the arm64 system?

ref) https://github.com/synadia-io/nex/issues/142#issuecomment-2006566255

NoahHahm commented 5 months ago

hi. @jordan-rash I have the same problem.

jordan-rash commented 5 months ago

Hi all. Rebuilding the rootfs doesn't currently work on ARM. We are currently working on moving the whole rootfs generation into the nex cli so that users can create custom rootfs much easier. For now, I will close in favor of https://github.com/synadia-io/nex/issues/159