pyavitz / debian-image-builder

Debian image builder for single board computers
Other
138 stars 33 forks source link

post build user specific configuration #9

Closed 0n3man closed 2 years ago

0n3man commented 2 years ago

This is a feature request. Any consideration for allowing a user specific script that would allow some configuration and inclusion of other packages prior to image file being created?

pyavitz commented 2 years ago

Do you have anything specific in mind? Like what exactly do you want to add.

0n3man commented 2 years ago

I'd be looking to build an image with home assistant. I run it on the odroid N2. It requires docker and then all of it's containers. Not sure if you include the gpio packages as I just did my first build with your script and at this point it's failing to boot. These are notes I use every time a build an home assistant controller using armbian

Docker install based on https://docs.docker.com/engine/install/debian/#install-using-the-repository
Prep for docker-ce install: apt-get install ca-certificates curl gnupg lsb-release

Add key: sudo curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Set to use stable docker repo:
echo \
 "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg]\
 https://download.docker.com/linux/debian \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install docker:
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io

Install and enable apparmor as per:https://tadeubento.com/2019/armbian-enable-apparmor/

apt install apparmor
echo "extraargs=apparmor=1 security=apparmor" >> /boot/armbianEnv.txt
update-initramfs -u

Install HA supervisor via instructions here:https://www.home-assistant.io/installation/linux

Prep: apt install jq libglib2.0-bin 
Clean things up: apt autoremove
apt --fix-broken install
apt install udisks2

Pull HA os-agent: 
wget \
https://github.com/home-assistant/os-agent/releases/download/1.2.2/os-agent_1.2.2_linux_aarch64.deb

Install HA os-agent: dpkg -i os-agent_1.2.2_linux_aarch64.deb

From above web page you accessed above click link for supervived-install: 
     https://github.com/home-assistant/supervised-installer
We’ve already installed docker-ce and other dependencies so just need to install Home assistant supervised pacakge:

wget https://github.com/home-assistant/supervised-installer/releases/latest/download/homeassistant-supervised.deb
dpkg -i homeassistant-supervised.deb
Select hardware: odroid-n2

GPIO install: I utilize the GPIO pins on the N2 to connect to a few wired contact sensors. As such I need to have the GPIO packages installed.  

apt install gpiod python3-libgpiod
pyavitz commented 2 years ago

Well lets diagnose the failing to boot part first. By default the Amlogic imgs are created to be booted from sd, unless marked other wise. Are you attempting to boot from sd or emmc?

No. it doesn't appear I include that package, but I could add it to the list.

0n3man commented 2 years ago

I've actually tried both from SD and emmc. Neither is booting. Watching the console it gets to the point where it says starting kernel and prints out a uboot time. Based on the blue light flashing on the N2 it appears the kernel could be up. However I never see any additional output in the console or over the HDMI. Also I don't get and dhcp request from the network. I tried the latest armbian and it also fails to boot on N2 in the same fashion. There is a version of debian bullseye available on the hardkernel forum that does boot. I know I had booted a little older version of armbian bullseye two months ago that had an little older version of the kernel. I'm suspecting that maybe something in the kernel may have changed, but don't know. If you have any suggestion on things to try to get it to boot I'd appreciate your input. Thanks.

0n3man commented 2 years ago

I saw the emmc flag in the configuration file and also tried enabling that when I tried the emmc boot.

pyavitz commented 2 years ago

Odd. I don't personally have the original N2, I have the N2+. But I know someone who does and just the other day had to make an img to test his, because he apparently fried it some how. He did say the img booted fine though.

I'm gonna assume you are not using Petitboot? I don't think the extlinux loading screen would have been seen in that case? But in any case, that can't be used with these imgs.

What kernel revision did you use? 5.17-rc6 is kind of messed up currently. I'm on 5.16.12 at the moment.

pyavitz commented 2 years ago

If all else, I can attempt to make you one and see if I have better luck? One thing you might wanna check, is if your PARTUUID in /boot/extlinux/extlinux.conf matches whats actually on the img.

0n3man commented 2 years ago

I just used your image as it built, so I think it used uboot. I'm out right now and will check the kernel version number when I get home

On Sat, Mar 5, 2022, 3:28 PM c0rnelius @.***> wrote:

Odd. I don't personally have the original N2, I have the N2+. But I know someone who does and just the other day had to make an img to test his, because he apparently fried it some how. He did say the img booted fine though.

I'm gonna assume you are not using Petitboot? I don't think the extlinux loading screen would have been seen in that case? But in any case, that can't be used with these imgs.

What kernel revision did you use? 5.17-rc6 is kinda mess up currently. I'm on 5.16.12 at the moment.

— Reply to this email directly, view it on GitHub https://github.com/pyavitz/debian-image-builder/issues/9#issuecomment-1059826872, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQVIOIFAYXWC7IYFYE7G2LU6O7V3ANCNFSM5P7YZZ3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

pyavitz commented 2 years ago

The img is marked for eMMC: g12b-odroid-n2-debian-bullseye-5.16.12-2022-03-05.img.xz

username: on3man
password: board

Let me know how it goes.

0n3man commented 2 years ago

I just grabbed your image, decompressed it with xz, used balenaEtcher to write it to a 32G hardkernel emmc card, put it on the board and booted it while connected with minicom to the console port and with hdmi connected to monitor. A bunch of messages printed out on the console and then it said "Starting kernel", followed by a couple of pages of kernel level messages. Nothing ever showed on the hdmi output. Maybe it was doing a resize or something. I though this is good as it got further. It seemed like it was hung as I couldn't ping it via the network and no response on the console or hdmi. I cycled the power to see what would happen on the second attempt and damn it's up and running. Maybe I should have waited a little longer before the power cycle as I probably only waited a minute to two. Does it make sense it would have been doing some initial configuration on the first boot? Can you share the userdata.txt file you used to build it as I'd like to see I can reproduce the image?

0n3man commented 2 years ago

FYI for my build of the image I did the following 1) build a vm based on latest bullseye with 10 cores and 32G memory 2) git clone your repo 3) run the install script in the repo 4) make config 5) make menu Sound about right?

pyavitz commented 2 years ago

It has two boot services. Firstboot which grows the ROOTFS partition and then it runs Credentials which checks for wifi and eth0 (can sometimes take a minute). Beyond that it has some other misc. ones, but nothing that should stop the progress of boot. Odd ur not getting any HDMI output past initial boot? Do you have a different display?

NOTE: After first boot those two services are removed

Here is the user data file minus ur name: (Sounds like you would wanna check crosscompile)

### USER INFORMATION
NAME=""
user="on3man"
passwd="board"
MOTD="Odroid N2"
HOSTNAME="odroidn2"
### DISTRIBUTION AND RELEASE
DISTRO="debian"
DISTRO_VERSION="bullseye"
### UBOOT AND LINUX KERNEL
UBOOT_VERSION="v2022.01"
VERSION="5.16.12"
BUILD_VERSION="1"
rc=0
menuconfig=0
COMPILER="gcc-10"
crosscompile=0
ccache=0
### WIRELESS
rtl8812au=1
rtl88x2bu=0
rtl8811cu=0
rtl8188eu=0
### CUSTOM
custom_defconfig=0
MYCONFIG="_defconfig"
emmc=1
verbose=0
auto=1
aircrack=0
### COMPILER TUNING
CORES=`nproc`
CFLAGS=""
### WHOAMI AND HOST
KBUSER="marvin"
KBHOST="martian"
### LOCALES
set_locales(){
apt install -y locales
export LANGUAGE=en_US.UTF-8
export LANG=en_US.UTF-8
export LC_ALL=C.UTF-8
locale-gen en_US.UTF-8
}
### TIMEZONE
set_timezone(){
ln -snf /usr/share/zoneinfo/America/New_York /etc/localtime
echo -n 'America/New_York' > /etc/timezone
}
### NAME SERVER
NAMESERVER1="8.8.8.8"
NAMESERVER2="8.8.4.4"
### DO NOT EDIT BELOW THIS LINE
builder=3.4

I would say let it run again and let it finish the boot process. I don't know if ur wired or not, but I added RTL88XXAU wifi support to it.

As an aside, it sounds like it's booting and ur just not getting a display or networking.


And yes. What you did all sounds right.

0n3man commented 2 years ago

Your image is working fine. It was just the first boot that was strange. The kernel is slightly newer than what I had. You have auto enabled and I didn't. I had crosscompile enabled. So I'm guessing you built the image on an odroid n2+? I also didn't have the locales stuff, but wouldn't expect that to have an impact. Running my build again with the modes to see what happens.

pyavitz commented 2 years ago

So the image is working for you now? That locales, time zone and nameserver business was added by request. Its just in the file in case someone needs to change it.

Yeah I built the img on my N2.

0n3man commented 2 years ago

Yeah it's working fine for the short period I've had it running. After I try generating the image on my VM, I'll let you know how that goes. If it fails I'll try an run it on N2 without the crosscompile to see if that works. I really appreciate your assistants. If I get home assistant working and everything is stable I'd like to post instructions on the home assistance forum pointing over to your repo if you're good with that.

0n3man commented 2 years ago

I'm curious when it comes to doing a kernel update on your operational system, do you just run the kernel step from your script and then expand the tar file on your operational system or do you have a different technique?

pyavitz commented 2 years ago

I have three diff boxes I build on, so its not always native. S922X, RK3399 and an X86_64 i7. As far as kernels builds, it depends how much I care about it getting done quickly.

My operational system? As in the stuff I have about, I'm using as a daily? I build kernels on my buildbox(s) and then copy them over to my dailies and run: sudo dpkg -i *.deb

I don't use or run VM's, so I'm not gonna be much help there.

As for posting instructions on/about Home Assistant. Absolutely.

0n3man commented 2 years ago

Great news. The image I just built with the newer kernel you used in your image booted 1st time without issue. Very exciting. I see you drop .deb files for the kernel image and headers in one of the output directories. So I see you have some patch files I'm guessing you apply in the kernel build and kernel build configuration files. Do you pull these from the vendor site or are you making the defconfig and patches yourself? How much effort is it to add a different platform? How did you arrive at the platforms you're currently supporting? This is really great work.

0n3man commented 2 years ago

When I login I see the message "Governor: performance", I assume this is clock speed. Is there a command on the box that allows you to change this, or possible in the build script?

pyavitz commented 2 years ago

Most of the defconfigs came from either Armbian or Archarm. I then modify them, which usually means pulling a bunch of stuff out. The Raspberry Pi 4 defconfig came from the rpi/linux github and was then modified.

I pull the patches from a lot of different sources. The main ones would be: tobetter, chewitt, armbian and lore.kernel.org. I also add my own and edit patches as it goes along through out revs. I end up recreating patches to keep them up to date "or remove fuzz". That's why they are no longer signed.

In the case of tobetter "Odroid" I pull his source and create the patch set from it. This enables users to move forward in kernel revisions and not worry about where "tobetter is" in the revision cycle. When I find a break, I fix it or wait for one.

As for the governor: governor -h

Whatever you set, the governor service should set upon reboots. The README explains pretty much everything about services.

The output directory is where your kernel *.deb(s) end up. These can be installed on your running system. Same goes for the u-boot binaries, you just need to know how to flash them.

pyavitz commented 2 years ago

If the board being added is related in some way to other boards in the builder and aarch64, it's not hard.

0n3man commented 2 years ago

The board I'd love to get an os update for is the odroid C1.

I've loaded Home Assistant up on your image and so far everything is looking good. In reality your image seems to be a lot more lightweight compared to armbian. I've gain a few hundred megs of free memory using your image.

The last thing I need to verify is the way I enable cellular backup on home assistant. If that all works I'll be good.

I'd still love the option to have a script file that runs inside the build image during construction. I'd build a HA script that fully installs HA. I actually have a couple of other arm boards/applications I use around the house that I'd also build script files to regenerate them. Thus the desire to gave the odroid C1 image.

I did come across this post on the HA forum that calls out your effort.

I greatly appreciate all the assistance. Thanks

0n3man commented 2 years ago

I have both odroid N2 and N2+ boards. What's the negative aspect of using an N2 image on an N2+ board?

pyavitz commented 2 years ago

I don't have a C1 and plus it's Armv7 isn't it? It would be a lot of work to insert that into the builder. As for the imgs being lighter than Armbian. That's because they aren't Debian :) and the imgs made with this builder are basically vanilla.


What's the negative aspect of using an N2 image on an N2+ board?

It won't boot.

But the kernel patching is identical and I'm pretty sure the img scripting is too. Basically you could just run make odroidn2+-uboot, copy the odroid kernel into the output/odroidn2plus directory and create a new img. make odroidn2+-image

In theory you could flash the N2+ u-boot to a written N2 img as well. Just need to be able to mount the sd or emmc. The PARTUUID may change after flashing, you can check it against the extlnux.conf file with sudo blkid. If changed, input the new one.

eMMC
sudo dd if=u-boot.bin of=/dev/sdX bs=512 seek=1

SDCARD
sudo dd if=u-boot.bin.sd.bin of=/dev/sdX bs=1 count=442 conv=fsync
sudo dd if=u-boot.bin.sd.bin of=/dev/sdX bs=512 skip=1 seek=1 conv=fsync

I'd still love the option to have a script file that runs inside the build image during construction.

It's possible, but currently the img size is set to 2.8GB. If what was being added exceeded that, the img would blow up. I'll think on it and try to figure something out. What I personally do is create a script to run on the board its self. Example

I greatly appreciate all the assistance. Thanks

You are welcome.

0n3man commented 2 years ago

Adding the containers for HA would blow pass the space limit you mentioned. I guess the idea thing for running the HA installer would be to put the installer in place and then run it on the boot immediately after the disk partition has been expanded. So a guess one solution would be to drop a final product install script into the image, then have it run once up front. So your music box image could be specified as needing to be run when building the image and then it would get executed. Just a thought.

pyavitz commented 2 years ago

Inserting a custom user script function would be no problem. I actually already have that feature in the rpi-img-builder. It's only half baked though, as I never added anything past its inclusion into the image made.

arakeen commented 2 years ago

0n3man: Do you use either IRC or Discord? If you do we can work with you on some more potential ways to add those customizations to an image.

0n3man commented 2 years ago

I am on discord. CPmentor#0472.

arakeen commented 2 years ago

Here's a link for the new text channel I created for generic image builder discussion. https://discord.gg/53Ny5pFQ

I'll be online for a while

0n3man commented 2 years ago

So I ran through my testing using the cellular backup network connection. That works. During the testing I had two cases where the system hung on reboot. I know that armbian had issues with the N2 systems hanging on reboot when using emmc modules. I'm curious if you use micro sd or emmc? I'm going to do some testing with micro sd to see if I get the same hanging issue. Not sure if it's useful but on one of the hangs I got some indication messages printed on the serial console port. This is exactly what happened on the first boot I'd mentioned earlier in the thread. Not sure if this is useful, but these are the messages

Starting kernel ...

[    2.717332] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[    2.717367] Modules linked in: videodev ir_nec_decoder panfrost(+) mc meson_gxbb_wdt(+) meson_ir(+) gpu_sched meson_e
[    2.759640] [drm] Initialized panfrost 1.2.0 20180908 for ffe40000.gpu on minor 0
[    2.760477] CPU: 2 PID: 466 Comm: modprobe Not tainted 5.16.12 #1
[    2.773932] Hardware name: Hardkernel ODROID-N2 (DT)
[    2.778848] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.785746] pc : blk_mq_poll_stats_fn+0x3c/0x60
[    2.790231] lr : bsearch+0x50/0xc0
[    2.793595] sp : ffff80000a2d3b20
[    2.796873] x29: ffff80000a2d3b20 x28: ffff800000fff028 x27: ffff800000fff0a8
[    2.803945] x26: ffff800001001400 x25: ffff800008141b90 x24: ffff800000fff8ff
[    2.811017] x23: ffff8000093e0798 x22: 000000000000000c x21: 0000000000000d6b
[    2.818090] x20: ffff8000093e078c x19: 0000000000000d6b x18: 0000000000000002
[    2.825162] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaae967fd98
[    2.832235] x14: 0000000000000001 x13: 65746174735f7773 x12: 0000000000000000
[    2.839308] x11: 000000000000000b x10: 0101010101010101 x9 : 0000000000000000
[    2.846380] x8 : 7f7f7f7f7f7f7f7f x7 : 6f6974656c706d6f x6 : 635f616d645f6273
[    2.853452] x5 : 0c023a0902361800 x4 : ffff800008141b90 x3 : 000000000000006d
[    2.860525] x2 : 0000000000000072 x1 : ffff80000941459f x0 : 0000000000000005
[    2.867598] Call trace:
[    2.870014]  blk_mq_poll_stats_fn+0x3c/0x60
[    2.874153]  find_exported_symbol_in_section+0x4c/0xd0
[    2.879241]  find_symbol+0x48/0x190
[    2.882691]  load_module+0x1e90/0x2550
[    2.886400]  __do_sys_finit_module+0xac/0x100
[    2.890206] random: crng init done
[    2.890712]  __arm64_sys_finit_module+0x24/0x30
[    2.890717]  invoke_syscall+0x48/0x114
[    2.894084] random: 7 urandom warning(s) missed due to ratelimiting
[    2.898561]  el0_svc_common.constprop.0+0x44/0xec
[    2.898566]  do_el0_svc+0x24/0x90
[    2.898569]  el0_svc+0x20/0x60
[    2.919437]  el0t_64_sync_handler+0x1a4/0x1b0
[    2.919440]  el0t_64_sync+0x1a0/0x1a4
[    2.919447] Code: 340000e4 a9401c26 a9001c66 a9411c26 (a9011c66) 
[    2.919450] ---[ end trace c9f10d82901eecce ]---
[    2.938588] rc rc0: meson-ir as /devices/platform/soc/ff800000.bus/ff808000.ir/rc/rc0
[    2.946198] rc rc0: lirc_dev: driver meson-ir registered at minor = 0, raw IR receiver, no transmitter
[    2.955195] input: meson-ir as /devices/platform/soc/ff800000.bus/ff808000.ir/rc/rc0/input1
[    2.963820] meson-ir ff808000.ir: receiver initialized
[    2.976801] Adding 1048572k swap on /dev/zram0.  Priority:100 extents:1 across:1048572k SSFS
[    6.066062] rc rc0: two consecutive events of type space
[   31.710075] USB_PWR_EN: disabling
[   31.710115] TF_IO: disabling

I know the OS was at least partially up because I have keyboard and mouse connected via a USB switch. If I switched the keyboard between my desktop and the N2, messages were displayed on the console.

0n3man commented 2 years ago

Tried a power cycle after the hang and got a kernel panic this time. I do have a phone connected to the USB now, using as a backup network via cellular. not sure if that's having an impact or not. Will test some more

Starting kernel ...

[    2.552262] Insufficient stack space to handle exception!
[    2.552271] ESR: 0x9a000000 -- SP Alignment
[    2.552277] FAR: 0xffff000003187f08
[    2.552279] Task stack:     [0xffff800009e00000..0xffff800009e04000]
[    2.552280] IRQ stack:      [0xffff8000099d8000..0xffff8000099dc000]
[    2.552282] Overflow stack: [0xffff0000479980a0..0xffff0000479990a0]
[    2.552286] CPU: 2 PID: 373 Comm: haveged Not tainted 5.16.12 #1
[    2.552290] Hardware name: Hardkernel ODROID-N2 (DT)
[    2.552292] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.552296] pc : __arm64_sys_clock_gettime+0x4/0xd0
[    2.552307] lr : invoke_syscall+0x48/0x114
[    2.552313] sp : ffff000003187f38
[    2.552314] x29: ffff800009e03e10 x28: ffff000003188000 x27: 0000000000000000
[    2.552320] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[    2.552324] x23: 0000000000000000 x22: 0000ffff82d8438c x21: 00000000ffffffff
[    2.552328] x20: ffff000003188000 x19: ffff800009e03eb0 x18: 0000000000000000
[    2.552331] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    2.552334] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[    2.552337] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[    2.552340] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000ffff82d82000
[    2.552343] x5 : 0000000000000001 x4 : 0000fffff93b16e8 x3 : ffff800008e709a8
[    2.552346] x2 : ffffffffffffffff x1 : ffff80000812ff50 x0 : ffff800009e03eb0
[    2.552351] Kernel panic - not syncing: kernel stack overflow
[    2.552353] CPU: 2 PID: 373 Comm: haveged Not tainted 5.16.12 #1
[    2.552356] Hardware name: Hardkernel ODROID-N2 (DT)
[    2.552358] Call trace:
[    2.552359]  dump_backtrace+0x0/0x1a0
[    2.552365]  show_stack+0x18/0x24
[    2.552369]  dump_stack_lvl+0x68/0x84
[    2.552374]  dump_stack+0x18/0x34
[    2.552376]  panic+0x14c/0x314
[    2.552380]  nmi_panic+0x8c/0x90
[    2.552384]  panic_bad_stack+0x104/0x120
[    2.552389]  handle_bad_stack+0x34/0x50
[    2.552393]  __bad_stack+0x8c/0x90
[    2.552395]  __arm64_sys_clock_gettime+0x4/0xd0
[    2.552398]  el0_svc_common.constprop.0+0xcc/0xec
[    2.552401]  do_el0_svc+0x24/0x90
[    2.552403]  el0_svc+0x20/0x60
[    2.552405]  el0t_64_sync_handler+0x1a4/0x1b0
[    2.552407]  el0t_64_sync+0x1a0/0x1a4
[    2.552411] SMP: stopping secondary CPUs
[    2.552419] Kernel Offset: disabled
[    2.552419] CPU features: 0x00,0000c142,00000846
[    2.552422] Memory Limit: none
[    2.767643] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
0n3man commented 2 years ago

woke up to find another kernel panic

ebian GNU/Linux 11 haController ttyAML0

haController login: [   11.962582] audit: type=1400 audit(1646630299.364:13): apparmor="STATUS" operation="profile_repl"
[   11.976458] audit: type=1400 audit(1646630299.364:14): apparmor="STATUS" operation="profile_replace" info="same as c"
[   11.997142] audit: type=1400 audit(1646630299.364:15): apparmor="STATUS" operation="profile_replace" info="same as c"
[   12.129893] hassio: port 3(vethcc4431a) entered blocking state
[   12.130130] hassio: port 3(vethcc4431a) entered disabled state
[   12.136070] device vethcc4431a entered promiscuous mode
[   12.515158] eth0: renamed from vethdd27d3c
[   12.547029] IPv6: ADDRCONF(NETDEV_CHANGE): vethcc4431a: link becomes ready
[   12.548439] hassio: port 3(vethcc4431a) entered blocking state
[   12.554168] hassio: port 3(vethcc4431a) entered forwarding state
[   12.788341] hassio: port 4(veth9d1f0f0) entered blocking state
[   12.788626] hassio: port 4(veth9d1f0f0) entered disabled state
[   12.794668] device veth9d1f0f0 entered promiscuous mode
[   12.799975] hassio: port 4(veth9d1f0f0) entered blocking state
[   12.805391] hassio: port 4(veth9d1f0f0) entered forwarding state
[   13.150400] hassio: port 4(veth9d1f0f0) entered disabled state
[   13.198797] eth0: renamed from veth43787a4
[   13.242740] IPv6: ADDRCONF(NETDEV_CHANGE): veth9d1f0f0: link becomes ready
[   13.244079] hassio: port 4(veth9d1f0f0) entered blocking state
[   13.249823] hassio: port 4(veth9d1f0f0) entered forwarding state
[   13.538146] hassio: port 5(veth4904bd2) entered blocking state
[   13.538485] hassio: port 5(veth4904bd2) entered disabled state
[   13.544479] device veth4904bd2 entered promiscuous mode
[   13.550027] hassio: port 5(veth4904bd2) entered blocking state
[   13.555218] hassio: port 5(veth4904bd2) entered forwarding state
[   13.963150] eth0: renamed from veth28af0d9
[   13.995096] IPv6: ADDRCONF(NETDEV_CHANGE): veth4904bd2: link becomes ready
[   15.227263] hdmi-audio-codec hdmi-audio-codec.3.auto: Only one simultaneous stream supported!
[   15.231409] hdmi-audio-codec hdmi-audio-codec.3.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -22
[   15.240955] axg-sound-card sound: ASoC: PRE_PMU: be.dai-link-6-playback event failed: -22
[   15.665995] hassio: port 6(veth9da7c0e) entered blocking state
[   15.667936] hassio: port 6(veth9da7c0e) entered disabled state
[   15.673828] device veth9da7c0e entered promiscuous mode
[   16.155043] eth0: renamed from vethdd805e2
[   16.183085] IPv6: ADDRCONF(NETDEV_CHANGE): veth9da7c0e: link becomes ready
[   16.187027] hassio: port 6(veth9da7c0e) entered blocking state
[   16.192691] hassio: port 6(veth9da7c0e) entered forwarding state
[   21.757939] hassio: port 7(vethb6806af) entered blocking state
[   21.759494] hassio: port 7(vethb6806af) entered disabled state
[   21.765456] device vethb6806af entered promiscuous mode
[   22.215051] eth0: renamed from veth1130afc
[   22.234865] IPv6: ADDRCONF(NETDEV_CHANGE): vethb6806af: link becomes ready
[   22.237429] hassio: port 7(vethb6806af) entered blocking state
[   22.243135] hassio: port 7(vethb6806af) entered forwarding state
[   31.710258] USB_PWR_EN: disabling
[   31.712692] TF_IO: disabling
[   61.205357] hassio: port 8(veth0de2a67) entered blocking state
[   61.206638] hassio: port 8(veth0de2a67) entered disabled state
[   61.212585] device veth0de2a67 entered promiscuous mode
[   61.635689] eth0: renamed from vethd4f2dfc
[   61.675775] IPv6: ADDRCONF(NETDEV_CHANGE): veth0de2a67: link becomes ready
[   61.678248] hassio: port 8(veth0de2a67) entered blocking state
[   61.683988] hassio: port 8(veth0de2a67) entered forwarding state
[ 1740.639665] ------------[ cut here ]------------
[ 1740.640066] kernel BUG at arch/arm64/kernel/traps.c:498!
[ 1740.644283] Internal error: Oops - BUG: 0 [#2] SMP
[ 1740.649027] Modules linked in: xt_nat veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype br_netfilter e
[ 1740.649113]  dwmac_meson8b stmmac_platform
[ 1740.739331] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D  C        5.16.12 #1
[ 1740.746661] Hardware name: Hardkernel ODROID-N2 (DT)
[ 1740.751578] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 1740.758477] pc : do_undefinstr+0x2e0/0x2f4
[ 1740.762531] lr : do_undefinstr+0xbc/0x2f4
[ 1740.766498] sp : ffff800009abbb90
[ 1740.769776] x29: ffff800009abbb90 x28: ffff000000140000 x27: ffff00004799d040
[ 1740.776848] x26: ffff00004799d040 x25: 0000000000000002 x24: 0000000000000002
[ 1740.783921] x23: 00000000800000c5 x22: ffff8000081ecb84 x21: ffff800009abbd60
[ 1740.790993] x20: 00000000d538d082 x19: ffff800009abbc10 x18: 0000000000000000
[ 1740.798066] x17: ffff80003e3f1000 x16: ffff800009abc000 x15: 0000000000004000
[ 1740.805138] x14: 0000000000000278 x13: 0000000000000001 x12: 000000000000028d
[ 1740.812211] x11: 0000000000000004 x10: 0000000000000000 x9 : ffff0000479a1fc0
[ 1740.819283] x8 : ffff0000479a1480 x7 : 000000000000028d x6 : 0000000000000000
[ 1740.826356] x5 : 000000000000001f x4 : ffff800009804c50 x3 : 0000000000000000
[ 1740.833428] x2 : 0000000000000005 x1 : ffff800009a070f8 x0 : 00000000800000c5
[ 1740.840502] Call trace:
[ 1740.842917]  do_undefinstr+0x2e0/0x2f4
[ 1740.846625]  el1_undef+0x2c/0x4c
[ 1740.849816]  el1h_64_sync_handler+0x80/0xd0
[ 1740.853956]  el1h_64_sync+0x78/0x7c
[ 1740.857406]  perf_event_task_tick+0x40/0x310
[ 1740.861632]  scheduler_tick+0xf4/0x2a0
[ 1740.865341]  update_process_times+0xd0/0xec
[ 1740.869481]  tick_sched_handle+0x30/0x70
[ 1740.873362]  tick_sched_timer+0x4c/0xa4
[ 1740.877157]  __hrtimer_run_queues+0x110/0x250
[ 1740.881470]  hrtimer_interrupt+0x114/0x304
[ 1740.885524]  arch_timer_handler_phys+0x30/0x50
[ 1740.889922]  handle_percpu_devid_irq+0x84/0x140
[ 1740.894407]  generic_handle_domain_irq+0x3c/0x60
[ 1740.898978]  gic_handle_irq+0x5c/0x90
[ 1740.902601]  call_on_irq_stack+0x2c/0x38
[ 1740.906482]  do_interrupt_handler+0x80/0x84
[ 1740.910622]  el1_interrupt+0x34/0x54
[ 1740.914158]  el1h_64_irq_handler+0x18/0x24
[ 1740.918212]  el1h_64_irq+0x78/0x7c
[ 1740.921576]  arch_cpu_idle+0x18/0x2c
[ 1740.925112]  default_idle_call+0x24/0x6c
[ 1740.928993]  do_idle+0x204/0x26c
[ 1740.932185]  cpu_startup_entry+0x28/0x80
[ 1740.936066]  secondary_start_kernel+0x148/0x170
[ 1740.940551]  __secondary_switched+0x94/0x98
[ 1740.944693] Code: 33103e80 2a0003f4 17ffff78 a9025bf5 (d4210000) 
[ 1740.950731] ---[ end trace 445c74d1445ba281 ]---
[ 1740.955301] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
[ 1740.962631] SMP: stopping secondary CPUs
[ 1740.966516] Kernel Offset: disabled
[ 1740.969962] CPU features: 0x00,0000c142,00000846
[ 1740.974534] Memory Limit: none
[ 1740.977556] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt ]---
pyavitz commented 2 years ago

This resembles the same bug I encountered on my N2+, which is described here: https://lore.kernel.org/linux-amlogic/1c71a7c1e58590811094ebdd0e1abc54@agner.ch/T/#t Although in ur case, it is coming back as a kernel bug. Which makes me think the two aren't related.

Out of curiosity is anything degraded when you run: systemctl status

One thing we could try, would be to downgrade to 5.15.26 LTS and see if the problem persists.

0n3man commented 2 years ago

I had to power cycle the box. It didn't come back with a quick power cycle. Had to leave power out for a minute then it came back. Only thing strange in systemctl status now is this init-stage2 failed, not sure what that's about:

             ├─docker-0c9aeb709b05afd2890e467ae8cbf90960c4a5e5608efa82640482ee468899f5.scope …
             │ ├─2058 s6-svscan -t0 /var/run/s6/services
             │ ├─2139 foreground  if   /etc/s6/init/init-stage2-redirfd   foreground    if     if      s6-echo      -n      --      [s6-init] making user provi>
!!!!!
 init-stage2 failed.
!!!!!
             │ ├─2140 s6-supervise s6-fdholderd
             │ ├─2151 foreground  s6-setsid  -gq  --  with-contenv  backtick  -D  0  -n  S6_LOGGING   printcontenv   S6_LOGGING    importas  S6_LOGGING  S6_LOG>
             │ └─2292 sleep infinity

I build the 5.15.26 kernel and give it a shot. I just need to include version number in the userdata.txt file, I don't need to indicate LTS in some way do I?

pyavitz commented 2 years ago

Just 5.15.26

Running systemctl may be able to pinpoint the fail.

EDIT: Looks like a Docker / Networking issue on that fail. I'm not much of a Docker fella though.

pyavitz commented 2 years ago

This issue appeared to be hardware related and to my knowledge resolved when using a new N2+. If there is anything further to add, please reopen the issue or create a new one.

Thanx.

pyavitz commented 1 year ago

One million years later...

This was actually caused by a patch that was overclocking the original N2 beyond its capability.