kairos-io / kairos

:penguin: The immutable Linux meta-distribution for edge Kubernetes.
https://kairos.io
Apache License 2.0
984 stars 87 forks source link

jetson orin not booting up with 3.0.9 and 3.0.10 #2559

Open nianyush opened 1 month ago

nianyush commented 1 month ago

tried with two different boxes. One stuck here

EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from conf10fd421]
[    0.000000] Linux version 5.10.104-tegra (buildbrail (Buildroot 2020.08) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #1 SMmemory scan node memory@80000000, reg size 16,
[    0.000000] OF: fdt:  - 80000000 ,  c0000000
[    0.000000] Machine model: J98 MEMRESERVE=0x10075b9f98 
[    0.000000] efi: seeding entropy created CMA memory pool at 0x000000100a800000, size 512 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] Zone ranges:
[    0.00   0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000fffdffff]
[    0.000000]   node   0: [mem 0x00000000fffe0000-0x0000x00000010061c0000-0x0000001006821fff]
[    0.000000]   node   0: [mem 0x0000001006822000-0x00000010069befff]
[    0.000000]  00-0x00000010075cffff]
[    0.000000]   node   0: [mem 0x0000000x00000010075f0000-0x000000100767ffff]
[    0.000000]   node   0ffff]
[    0.000000]   node   0: [mem 0x0000001007910000-0x0001007a00000-0x0000001007d1ffff]
[    0.000000]   node   0: [mem 0x0000001007d20000-0x0000001007d5ffff]
[    0.000000]   node   00000]   node   0: [mem 0x0000001008380000-0x000000100841ffff]
1ffff]
[    0.000000]   node   0: [mem 0x0000001008520000-0x0000x0000001008710000-0x000000100901ffff]
[    0.000000]   node   0: [mem 0x0000001009020000-0x00000010091fffff]
[    0.000000]  [    0.000000]   node   0: [mem 0x000000102aa00000-0x000000102e7 0x0000000080000000-0x0000001037ffffff]
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1t required
[    0.000000] psci: SMC Calling Convention v1.2
[    0.000000] percpu: Embedded 32 pages/cpu s90200 r8192 d32680 u131072
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: GIC system register CPU interface
forced ON by KASLR
[    0.000000] CPU features: detected: Kernected: Spectre-v4
[    0.000000] alternatives: patching kernel code
[    0.000000] Built 1 zonelists, mobility grouping on.  ToAGE=(loop0)/boot/vmlinuz console=ttyTCU0,115200 console=tty1 netEL=COS_ACTIVE cos-img/filename=/cOS/active.img selinux=0 rd.emer  0.000000] Memory: 63731440K/65810432K available (18496K kernelRCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=ks RCU enabled.
[    0.000000]  Rude variant of Tasks RCU enablu_fanout_leaf=16, nr_cpu_ids=12
[    0.000000] NR_IRQS: 64, nr_ (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xe6a171046, max_idle_ns: 881590405314 ns
[    0.000004] sched_clock: 56 bits at 31MHz, resolution 3 dummy device 80x25
[    0.001078] printk: console [tty1] enablSM: Security Framework initializing
[    0.001229] Yama: becomi83] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.001470] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.004073] rcu: X Orin, BIOS 3.1-32827747 03/19/2023
[    0.393296] NET: Regist
[    5.660649] arm-smmu 10000000.iommu:         stage 2 translation

device1.log

And the other one didn't show grub menu but only a login prompt. And i cannot login into it. output.log

ci-robbot commented 1 month ago

Please provide more information about the issue and how to reproduce it in order to properly categorize and address it. We have labeled it with the 'question' label to indicate that we need more information. If you need any assistance, feel free to ask. Additionally, this is a bot experiment from @mudler and @jimmykarily.

mudler commented 1 month ago

@nianyush we need the serial logs - the second one looks it is booting correctly - are you sure all the cloud configs are layed down correctly in the partitions?

nianyush commented 1 month ago

@mudler i am following this only. Didn't add any custom cloud configs image

mudler commented 1 month ago

@nianyush you have to follow https://kairos.io/docs/installation/nvidia_agx_orin/#default-configuration

Just for reference, the first logs (device 1) looks like https://github.com/kairos-io/kairos/issues/2467 (that's why we need more logs)

ci-robbot commented 1 month ago

Hello nianyush, it seems like you've opened a Github issue with insufficient information for the project's requirements. Please provide a detailed description of the issue and steps to reproduce it, if it's a bug. Additionally, kindly mention the versions of the artifacts being used.

To meet project requirements, please consider revising your issue and providing the necessary information. Once the issue meets the criteria, it will be labeled as 'triage' to indicate that it was reviewed. If you have any further questions, feel free to ask.

I am a bot, an experiment of @mudler and @jimmykarily, here to provide guidance on Github issues for the kairos-io project.

mudler commented 4 weeks ago

@nianyush any news here?

nianyush commented 3 weeks ago

i was able to build the image with updated partition size and debug options in the cmdline. But from the screen logs i am not able to see much difference

nianyush commented 3 weeks ago

screen.log

mudler commented 3 weeks ago

@nianyush this seems quite weird to me - the boot stops pretty early, however, almost at the end:

[   14.310980] nvidia: loading out-of-tree module taints kernel.

I wonder if that causes issues and the Kernel is to blame.

Can you check if you can boot properly with the same process you are using with https://github.com/kairos-io/kairos/releases/tag/v3.0.6 ? Note that image was confirmed to be booting by QA (https://github.com/kairos-io/kairos/issues/2467#issuecomment-2063524196)

This looks more of a kernel issue rather than a Kairos issue at this point

nianyush commented 3 weeks ago

Sure i will try that