lf-edge / eden

Eden is where EVE and Adam get tried and tested:
https://projecteve.dev
Apache License 2.0
49 stars 47 forks source link

Fix user apps test suite #944

Closed uncleDecart closed 2 weeks ago

uncleDecart commented 7 months ago

What: Test Bug

Currently in master user app test suite fails. Probably it has to do something with port forwarding, because test fails on retrieving curl from a website which is actually available, so link is not broken.

Happens on master: yes Reproducable locally: yes

uncleDecart commented 7 months ago

@milan-zededa could it be because of eden SDN?

milan-zededa commented 7 months ago

@uncleDecart Networking is OK but the nodered application halts shortly after booting:

[    0.638198][    T1] loop: module loaded
[    0.642915][    T1] lpc_ich 0000:00:1f.0: I/O space for GPIO uninitialized
[    0.643066][    T1] Microchip SmartPQI Driver (v2.1.18-045)
[    0.643159][    T1] megasas: 07.719.03.00-rc1
[    0.643245][    T1] mpt3sas version 43.100.00.00 loaded
[    0.643388][    T1] VMware PVSCSI driver - version 1.0.7.0-k
[    0.643482][    T1] hv_vmbus: registering driver hv_storvsc
[    0.643834][    T1] ACPI: \_SB_.GSIA: Enabled at IRQ 16
[    0.645230][    T1] ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 6 ports 1.5 Gbps 0x3f impl SATA mode
[    0.645341][    T1] ahci 0000:00:1f.2: flags: 64bit ncq only
[    0.646913][    T1] scsi host0: ahci
[    0.647030][    T1] scsi host1: ahci
[    0.647129][    T1] scsi host2: ahci
[    0.647539][    T1] scsi host3: ahci
[    0.647653][    T1] scsi host4: ahci
[    0.647757][    T1] scsi host5: ahci
[    0.647849][    T1] ata1: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404100 irq 29
[    0.647944][    T1] ata2: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404180 irq 29
[    0.648033][    T1] ata3: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404200 irq 29
[    0.648133][    T1] ata4: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404280 irq 29
[    0.648231][    T1] ata5: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404300 irq 29
[    0.648351][    T1] ata6: SATA max UDMA/133 abar m4096@0xfe404000 port 0xfe404380 irq 29
[    0.648874][    T1] Rounding down aligned max_sectors from 4294967295 to 4294967288
[    0.649198][    T1] db_root: cannot open: /etc/target
[    0.649355][    T1] tun: Universal TUN/TAP device driver, 1.6
[    0.651785][    T1] VMware vmxnet3 virtual NIC driver - version 1.7.0.0-k-NAPI
[    0.651886][    T1] hv_vmbus: registering driver hv_netvsc
[    0.651951][    T1] Fusion MPT base driver 3.04.20
[    0.652010][    T1] Copyright (c) 1999-2008 LSI Corporation
[    0.652066][    T1] Fusion MPT SPI Host driver 3.04.20
[    0.652161][    T1] VFIO - User Level meta-driver version: 0.3
[    0.653223][    T1] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    0.653720][    T1] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    0.654483][    T1] xhci_hcd 0000:01:00.0: hcc params 0x00087001 hci version 0x100 quirks 0x0000000000000010
[    0.655734][    T1] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    0.655808][    T1] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    0.655923][    T1] xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed
[    0.656270][    T1] hub 1-0:1.0: USB hub found
[    0.656493][    T1] hub 1-0:1.0: 15 ports detected
[    0.657526][    T1] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    0.657729][    T1] hub 2-0:1.0: USB hub found
[    0.657945][    T1] hub 2-0:1.0: 15 ports detected
[    0.658959][    T1] usbcore: registered new interface driver usb-storage
[    0.659096][    T1] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[    0.660502][    T1] serio: i8042 KBD port at 0x60,0x64 irq 1
[    0.660643][    T1] serio: i8042 AUX port at 0x60,0x64 irq 12
[    0.660781][    T1] hv_vmbus: registering driver hyperv_keyboard
[    0.660892][    T1] mousedev: PS/2 mouse device common for all mice
[    0.661407][   T70] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    0.661778][    T1] input: PC Speaker as /devices/platform/pcspkr/input/input3
[    0.661904][    T1] rtc_cmos 00:03: RTC can wake from S4
[    0.662875][    T1] rtc_cmos 00:03: registered as rtc0
[    0.663100][    T1] rtc_cmos 00:03: setting system clock to 2024-01-26T13:01:28 UTC (1706274088)
[    0.663297][    T1] rtc_cmos 00:03: alarms up to one day, y3k, 242 bytes nvram
[    0.663384][    T1] i2c_dev: i2c /dev entries driver
[    0.663499][    T1] device-mapper: ioctl: 4.47.0-ioctl (2022-07-28) initialised: dm-devel@redhat.com
[    0.663683][    T1] intel_pstate: CPU model not supported
[    0.663798][    T1] sdhci: Secure Digital Host Controller Interface driver
[    0.663880][    T1] sdhci: Copyright(c) Pierre Ossman
[    0.663946][    T1] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.664048][    T1] ledtrig-cpu: registered to indicate activity on CPUs
[    0.664256][    T1] hv_utils: Registering HyperV Utility Driver
[    0.664344][    T1] hv_vmbus: registering driver hv_utils
[    0.664399][    T1] hv_vmbus: registering driver hv_balloon
[    0.664467][    T1] GACT probability on
[    0.664511][    T1] Mirror/redirect action on
[    0.664571][    T1] Simple TC action Loaded
[    0.664620][    T1] u32 classifier
[    0.664666][    T1]     Performance counters on
[    0.664721][    T1]     input device check on
[    0.664775][    T1]     Actions configured
[    0.664919][    T1] xt_time: kernel timezone is -0000
[    0.664986][    T1] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[    0.665064][    T1] IPVS: Connection hash table configured (size=4096, memory=32Kbytes)
[    0.665180][    T1] IPVS: ipvs loaded.
[    0.665226][    T1] IPVS: [rr] scheduler registered.
[    0.665283][    T1] IPVS: [wrr] scheduler registered.
[    0.665339][    T1] IPVS: [lc] scheduler registered.
[    0.665393][    T1] IPVS: [wlc] scheduler registered.
[    0.665448][    T1] IPVS: [fo] scheduler registered.
[    0.665506][    T1] IPVS: [ovf] scheduler registered.
[    0.665561][    T1] IPVS: [lblc] scheduler registered.
[    0.665627][    T1] IPVS: [lblcr] scheduler registered.
[    0.665687][    T1] IPVS: [dh] scheduler registered.
[    0.665741][    T1] IPVS: [sh] scheduler registered.
[    0.665794][    T1] IPVS: [sed] scheduler registered.
[    0.665851][    T1] IPVS: [nq] scheduler registered.
[    0.665940][    T1] ipip: IPv4 and MPLS over IPv4 tunneling driver
[    0.666080][    T1] gre: GRE over IPv4 demultiplexor driver
[    0.666148][    T1] ipt_CLUSTERIP: ClusterIP Version 0.8 loaded successfully
[    0.666255][    T1] Initializing XFRM netlink socket
[    0.666343][    T1] NET: Registered PF_INET6 protocol family
[    0.666607][    T1] Segment Routing with IPv6
[    0.666679][    T1] In-situ OAM (IOAM) with IPv6
[    0.666873][    T1] NET: Registered PF_PACKET protocol family
[    0.666973][    T1] NET: Registered PF_KEY protocol family
[    0.667046][    T1] Bridge firewalling registered
[    0.667113][    T1] 8021q: 802.1Q VLAN Support v1.8
[    0.667194][    T1] 9pnet: Installing 9P2000 support
[    0.668580][    T1] Key type dns_resolver registered
[    0.668689][    T1] NET: Registered PF_VSOCK protocol family
[    0.668827][    T1] IPI shorthand broadcast: enabled
[    0.668892][    T1] AVX2 version of gcm_enc/dec engaged.
[    0.668965][    T1] AES CTR mode by8 optimization enabled
[    0.669230][    T1] sched_clock: Marking stable (660058303, 6179334)->(695053167, -28815530)
[    0.669387][    T1] registered taskstats version 1
[    0.669462][    T1] Loading compiled-in X.509 certificates
[    0.670003][    T1] Loaded X.509 cert 'Build time autogenerated kernel key: 989af4ddff6e95c30e190bb4f6fcd858240d3816'
[    0.670183][    T1] Key type .fscrypt registered
[    0.670248][    T1] Key type fscrypt-provisioning registered
[    0.670452][    T1] Key type encrypted registered
[    0.670572][    T1] AppArmor: AppArmor sha1 policy hashing enabled
[    0.946625][   T70] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[    0.978824][   T97] ata1: SATA link down (SStatus 0 SControl 300)
[    0.980357][  T103] ata4: SATA link down (SStatus 0 SControl 300)
[    0.981526][  T101] ata3: SATA link down (SStatus 0 SControl 300)
[    0.982596][  T105] ata5: SATA link down (SStatus 0 SControl 300)
[    0.983559][   T99] ata2: SATA link down (SStatus 0 SControl 300)
[    0.991018][  T107] ata6: SATA link down (SStatus 0 SControl 300)
[    1.001977][    T1] Freeing unused kernel image (initmem) memory: 2144K
[    1.002190][    T1] Write protecting the kernel read-only data: 24576k
[    1.002722][    T1] Freeing unused kernel image (text/rodata gap) memory: 2032K
[    1.003133][    T1] Freeing unused kernel image (rodata/data gap) memory: 1544K
[    1.003237][    T1] Run /init as init process
[    1.134934][  T354] 8021q: adding VLAN 0 to HW filter on device eth0
udhcpc: started, v1.35.0
udhcpc op deconfig interface eth0
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.11.12.2, server 10.11.12.1
udhcpc: lease of 10.11.12.2 obtained from 10.11.12.1, lease time 3600
udhcpc op bound interface eth0
Mount /mnt/modules as /lib/modules, result 0
[    1.455983][  T379] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[    1.456143][  T379] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
Modprobe wireguard, result 0
Executing /mount_disk.sh
Run acpid daemon
Executing agetty

9b1d2cf9-8093-4ac3-bb97-dcb454cab11e login: root (automatic login)

login[394]: root login on 'hvc1'
~ # Executing "npm" "--no-update-notifier" "--no-fund" "start" "--cache" "/data/.npm" "--" "--userDir" "/data"
Executing with uid gid: 1000 1000
chroot2: getpwuid(1000) failed:: No error information
chroot2 exited with 255, the init-initrd is about to quit by calling /sbin/poweroff, which shuts the whole VM down
sh: can't kill pid 394: No such process
[    1.726865][   T16] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
umount: proc busy - remounted read-only
umount: can't unmount /: Invalid argument
[   31.477419][  T402] kvm: exiting hardware virtualization
[   31.489738][  T402] ACPI: PM: Preparing to enter system sleep state S5
[   31.489990][  T402] reboot: Power down
milan-zededa commented 7 months ago

For comparison, this is how nodered logs look like when everything is OK (run on 9.4.10 EVE):

...
[    1.019931][    T1] Run /init as init process
qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
[    1.130547][  T369] 8021q: adding VLAN 0 to HW filter on device eth0
udhcpc: started, v1.35.0
udhcpc op deconfig interface eth0
udhcpc: broadcasting discover
udhcpc: broadcasting select for 10.11.12.2, server 10.11.12.1
udhcpc: lease of 10.11.12.2 obtained from 10.11.12.1, lease time 3600
udhcpc op bound interface eth0
Executing /mount_disk.sh
Run acpid daemon
Executing "npm" "--no-update-notifier" "--no-fund" "start" "--cache" "/data/.npm" "--" "--userDir" "/data"
Executing with uid gid: 1000 1000
[    1.735525][    T5] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    5.165493][    C0] random: crng init done

> node-red-docker@2.1.3 start /usr/src/node-red
> node $NODE_OPTIONS node_modules/node-red/red.js $FLOWS "--userDir" "/data"

26 Jan 13:31:36 - [info]

Welcome to Node-RED
===================

26 Jan 13:31:39 - [info] Node-RED version: v2.1.3
26 Jan 13:31:39 - [info] Node.js  version: v14.18.1
26 Jan 13:31:39 - [info] Linux 5.10.186-linuxkit x64 LE
26 Jan 13:31:43 - [info] Loading palette nodes
26 Jan 13:31:54 - [info] Settings file  : /data/settings.js
26 Jan 13:31:54 - [info] Context store  : 'default' [module=memory]
26 Jan 13:31:54 - [info] User directory : /data
26 Jan 13:31:54 - [warn] Projects disabled : editorTheme.projects.enabled=false
26 Jan 13:31:54 - [info] Flows file     : /data/flows.json
26 Jan 13:31:54 - [warn]

---------------------------------------------------------------------
Your flow credentials file is encrypted using a system-generated key.

If the system-generated key is lost for any reason, your credentials
file will not be recoverable, you will have to delete it and re-enter
your credentials.

You should set your own key using the 'credentialSecret' option in
your settings file. Node-RED will then re-encrypt your credentials
file using your chosen key the next time you deploy a change.
---------------------------------------------------------------------

26 Jan 13:31:54 - [info] Server now running at http://127.0.0.1:1880/
26 Jan 13:31:54 - [info] Starting flows
26 Jan 13:31:54 - [info] Started flows
giggsoff commented 7 months ago

So, the error (chroot2: getpwuid(1000) failed:: No error information) come from here. But it is odd, line for uid 1000 is available in image:

$ docker run --rm -it --entrypoint /bin/bash nodered/node-red:2.1.3 -c "grep 1000 /etc/passwd"
node-red:x:1000:1000:Linux User,,,:/usr/src/node-red:/bin/ash

Probably we try to get it from somewhere else...

milan-zededa commented 7 months ago

@shjala Could do you please check if this issue is related to your changes in PR https://github.com/lf-edge/eve/pull/3521 ? This nodered test broke between 11.2.0 and 11.3.0

You can run it with:

make clean && make build-tests
./eden config add default
./eden config set default --key eve.log-level --value debug
./eden config set default --key eve.tag --value 11.3.0
./eden setup
./dist/bin/eden+ports.sh 2223:2223
./eden start
./eden eve onboard
./eden test tests/eclient -e nodered -v debug
shjala commented 7 months ago

@milan-zededa let me check it.

shjala commented 7 months ago

@milan-zededa @giggsoff you need to have that entry in the VM's /etc/passwd, not the container.

giggsoff commented 7 months ago

@milan-zededa @giggsoff you need to have that entry in the VM's /etc/passwd, not the container.

But in case of USER defined in Dockerfile we will have it only inside of container's rootfs. Assume that VM's /etc/passwd is out of user (and user's app) control. Looks like we should see into container's rootfs.

shjala commented 7 months ago

@giggsoff but we want to run the container as the host's user 1000, not the container's user 1000.

shjala commented 7 months ago

we can create the user in host if doesn't exist with some default permissions.

uncleDecart commented 7 months ago

@giggsoff but we want to run the container as the host's user 1000, not the container's user 1000.

@shjala by host here you mean VM where container is running in EVE and by user 1000 you mean first created user, right? In that case, how should we handle potential case where we run multiple containers in same VM to reduce overhead (https://github.com/lf-edge/eve/pull/3658)? And if we say one container per one VM, does it mean we should change something in EVE to make norered test work?

shjala commented 7 months ago

@uncleDecart Yes, user exist on the VM we run container in, container is basically a process on the host vm.

how should we handle potential case where we run multiple containers in same VM to reduce overhead

It is OK to run multiple containers under the same user, but that has security concerns which I need to address.

And if we say one container per one VM, does it mean we should change something in EVE to make norered test work?

We should change the init script, if user is not root and does not exist on the vm, create it with default permissions. I'm busy right now, but I have plan submit PR next week to fix that, feel free to beat me to it.

eriknordmark commented 7 months ago

After a re-run of https://github.com/lf-edge/eve/pull/3742 I see that the User apps test suite passed.

uncleDecart commented 7 months ago

After a re-run of https://github.com/lf-edge/eve/pull/3742 I see that the User apps test suite passed.

That's great, I guess we can close this issue once we release EVE with a fix and bump version in Eden repo, so that there will be red test on eden as well

uncleDecart commented 2 weeks ago

Bumped and fixed