Open sihil opened 7 years ago
I'm using a Raspberry Pi 2A PSU that I had to hand so I'm reasonably confident that power is not an issue. Also, the issue only occurs on subsequent boots if an HDMI display was attached on the first boot - and it doesn't sound like it should be generating keys on subsequent boots.
My testing setup has been brutally simple: have it plugged into an ethernet port. My criteria as to whether it has booted or not is whether the interface comes up and I see traffic on the port. I've been leaving my laptop pinging the IP address. Crude, but effective and reproducible many times.
I was looking at dmesg output and noticed that the sunxi disp2 is initialised once on first boot and twice on subsequent boots. I have no idea if that's connected.
Sadly it's impossible to tell where it is stuck without a display or console attached. I've just ordered a USB/UART cable so I can do that (been regretting not buying the Pine64 adaptor in the first place). I might try seeing if I can connect it to the serial port of a raspberry pi tonight rather that waiting for that delivery.
I'd be intrigued to know if anyone else was able to re-produce it (or not able to re-produce it) - would give me more confidence that this is actually a thing rather than it being something silly that I've done or my particular board.
I'll write more when I discover anything new.
Well, just to be clear. I have flashed my images many times and usually do not have HDMI connected at all ever. I gues the issue is specific to your particular setup.
Yes, and that works. Unfortunately I built a machine that happened to be connected to HDMI on first boot and now I can't unplug the display to hide it in a cupboard as it won't boot :(
The simplest answer for me is to rebuild it and start over (which is now my plan for tonight), but that won't solve it for future users and violates the principle of least surprise.
I'll test that tonight, as I can't say with certainly I've done exactly that... connected with HDMI in the first instance, and then run the pine64 headless afterwards. I have mostly run it with HDMI connected all the time as it was a GUI image, or with no HDMI connected right from the start as I have run it with a console cable connected for the initial configuration.
btw, you should be able to post 1 message per day during the settling in period. If not, please send me a PM (same handle on the forum), as it means something has been misconfigured.
@pfeerick I am able to post again. It would be really helpful if you could add another line of text to the error page that indicates that rate limiting might be the reason.
I'm really interested to hear what your results are :)
I wasn't able to reproduce that behaviour. Here was my test methodology so we can verify we are on the same page.
I have booted a fresh image of Ubuntu (https://www.stdin.xyz/downloads/people/longsleep/pine64-images/ubuntu/xenial-pine64-bspkernel-20161218-1.img.xz). I plugged in a wireless USB keyboard/mouse dongle, ethernet, and HDMI. Powered up the pine64, let it boot up, logged in, rebooted. I pulled the HDMI as the pine64 was shutting down. Watched the ethernet lights, the pine64 came back up again, and I was able to log in via SSH.
So it has booted up with HDMI in the first instance, and had no problems. Booting up without the HDMI also appear to be fine. I tried powering up the pine64 up and down a few times, and it continued to start up flawlessly, so it wasn't a one off brought about by rebooting it.
My power supply is a 5A capable 12v to quad-usb converter, and it is tuned to the slightly higher voltage of 5.2v. Hopefully that will start to determine what is the cause of the problem. If you have a similar setup bar the power supply, then it does start sounding like it is power related.
Hmmm, curious. That does sound similar - except I have not plugged in a mouse or keyboard, just HDMI (that sounds ridiculous now I'm writing it down, but none the less).
I'll have another go tonight.
Thanks for testing this. I am very interested in getting this resolved. @sihil do you have an alternative power supply which you could try? Preferably power via the PINs on the Euler connector.
Also connecting any extra USB devices like keyboard or mouse require even more power unless they are connected via a powered USB hub which then might in turn feed power to Pine64.
Doesn't sound too ridiculous... you can always plug in the keyboard/mouse after the pine64 has booted and you can see stuff on the screen... or you might have the screen connected just to see boot messages ;)
Another thing to consider is kernel/uboot updates. If you had done that on the first boot, and something went wrong (it can happen, but it is likely to be power or sd card corruption related), that could be the cause, not the first boot with HDMI. In other words, don't do it (just in case that is the issue). And as longsleep said, alternate power supply to the euler pins would be great also, as that will provide more reliable power to the pine64.
I experienced the same issue again. I'll see if I can borrow a workbench PSU and do as you suggest.
I am seeing similar behaviours that you are @sihil when I flashed the xenial-pine64-bspkernel-20161218-1.img
. In my case my goal is to run headless, only access the board by ssh.
After flashing the board, I did not connect any cables except power (5V 2A) and ethernet. The board sometimes would come up though other times it would not. I read your post on the forum that it had some success when connecting an HDMI display so I tried that. And to my luck it came up just fine. I then unplugged the HDMI cable and used it headless.
However, if I reboot the board or power is lost, there is a good chance it won't come back up unless I connect an HDMI monitor and power cycle it a few times.
Note about power draw [1]:
On the 1GB and 2GB Pine64+ variants a DC5V/BAT POWER switch can be used to bypass the MT3608 boost converter (input voltage to 5V). If the board is powered from DC-IN (micro-USB or Euler connector), the DC5V setting connects the input voltage to the USB power supply rails, in BAT setting 5V is generated from any of the connected power sources (e.g. battery or DC-IN). The USB ports are current-limited to about 650mA per port in either setting.
Please be aware that when using the jumper in DC5V position an insufficient supply voltage is directly visible on the USB ports. If the Pine64+ is running on battery, the USB ports are only powered when the BAT setting is used.
@RyanRamchandar - so far i have seen no indication that there is a general issue with my image. I strongly suggest you get a better power supply or a lower AWG cable as i still think you guys suffer from a voltage drop which makes things go sideways on boot and HDMI just gives the extra juice to cope with that.
I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.
Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.
I didn't want to think it was a power supply issue either, but when running off a bench power supply (5A, good filtering), my previously 100% repro crash went away.
Possible solution: A 10µF tantalum (low ESR) capacitor soldered between the DC IN and GND pins of the Euler connector (via a 2x3 female header). Result: It's not 100% successful, but I've had 4 successful boots out of 5 now. Maybe a bigger cap will do it.
So what are you saying. It does not crash with your bench PSU? What is the reason for the capacitor? Did you try to slightly increase voltage with the bench PSU to 5.1V or 5.2V?
Yes, with my bench supply (set at 5.00v as exactly as possible) no crash. With all my other power supplies it crashed. Didn't try a higher voltage on the bench supply, because it works fine.
Adding a capacitor between DC IN and GND on the Euler connector gets booting working on several of those supplies... most of the time (roughly 80%).
Hi, i do encounter the same issue using headless image with kernel 3.10.105. However, it is not caused by HDMI but the ethernet. It cannot boot up at all and shows "BUG: soft lockup - CPU#0 stuck for 22s! " without ethernet plugged in but it sometimes can boot up successfully with ethernet plugged in. So, is it related to power supply issue too?
@whongx yes - Ethernet draws quite some power and Gigabit Ethernet even more.
@longsleep ok! But it cannot boot up when the ethernet is not plugged in. And I forget to mention that it does not encounter the issue when using kernel 3.10.104.
@whongx what does it mean "cannot boot up" ? Do you have logs or at least an error message?
@longsleep Most likely related: similar issue can be reproduced with Armbian builds (your BSP kernel source with slightly different configuration). Kernel randomly stalls on boot with different stall to success rate depending on connected/disconnected Ethernet, connected/disconnected HDMI display, etc., but there is no clear conection between these factors. Dmesg logs with stack traces can be found in attachments in this thread, I'm attaching one of them here: BOOTFail_2017-04-15-C1.txt
According to my understanding it locks up somewhere here when setting up IRQ for the DE2 HDMI driver:
[ 45.232803] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[ 45.241520] [<ffffffc000125844>] __setup_irq+0x318/0x3e0
[ 45.250792] [<ffffffc000125a84>] request_threaded_irq+0xe0/0x124
[ 45.260858] [<ffffffc00041280c>] disp_sys_register_irq+0x88/0x98
[ 45.270936] [<ffffffc000420610>] disp_hdmi_enable+0x1d4/0x278
[ 45.280724] [<ffffffc000414540>] disp_device_attached_and_enable+0x1bc/0x1d4
[ 45.291985] [<ffffffc0004146f8>] bsp_disp_device_switch+0xbc/0xe4
[ 45.302194] [<ffffffc00040b50c>] start_work+0x174/0x1f0
[ 45.311445] [<ffffffc0000cb788>] process_one_work+0x27c/0x42c
[ 45.321274] [<ffffffc0000cc76c>] worker_thread+0x208/0x320
[ 45.330810] [<ffffffc0000d27ec>] kthread+0xb4/0xbc
Part of the stack trace above this must be related to the watchdog that detects the lockup, but in case it doesn't it may be related to the arch timer bug referenced in https://github.com/longsleep/linux-pine64/issues/44
I am using modified ATX power supply for tests connected to the pin header, so underpowering should not be an issue in my setup.
I was able to reproduce a boot-up panic with a specific USB device connected. PR https://github.com/longsleep/linux-pine64/pull/56 seems to fix that. If you can please try if that change also fixes your particular issue.
I'm getting these lockups with no USB devices connected (even got one today with another good power supply when I was testing u-boot changes). While the problem can be power related stack traces look too strange to me, Also one time I got this log pine64-lockup-debug3.txt - it didn't happen in initrd as usual but much later in the boot process.
Anyway I'll try to test the PR changes later.
Yes - i doubt that the USB change does fix lock-ups which happen later. I will also merge your backport-fsl-errata.patch now after reading up on the issue. But as you probably use a Kernel with that patch already this also does not fix every issue. That FSL fix might resolve https://github.com/longsleep/linux-pine64/issues/44 though.
Yes - i doubt that the USB change does fix lock-ups which happen later.
The stack traces for the "stuck" kworker look too similar in both cases, so it looks like the same issue. And since I enabled a lot of debugging options for spinlocks and mutexes, each time HDMI lock was still held by disp_hdmi_enable() function.
Unfortunately it's still not clear what IRQs correspond to lines like el1_irq+0x84/0xec
.
I was able to reproduce a boot-up panic with a specific USB device connected. PR longsleep/linux-pine64#56 seems to fix that. If you can please try if that change also fixes your particular issue.
longsleep/linux-pine64#56 makes USB crash less often but it still crashes a lot on boot with "MOSART Semi. Rapoo 2.4G Wireless Touch Desktop" plugged in. Also the FSL fix does not help.
Btw, on Pinebook with exactly same Kernel - it works just fine every time.
@longsleep Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?
@longsleep Are you getting lockups with stack traces similar to posted previously with disp2 HDMI functions in them?
@zador-blood-stained - Yes, very similar to pine64-lockup-debug3.txt - it has
[ 39.838477] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:1:30]
[ 39.851912] Modules linked in:
[ 39.861726]
[ 39.869831] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 3.10.105-- #35
[ 39.883727] Workqueue: events start_work
[ 39.894722] task: ffffffc078b52f80 ti: ffffffc078b54000 task.ti: ffffffc078b54000
[ 39.909764] PC is at __do_softirq+0xb4/0x2d8
[ 39.921341] LR is at __do_softirq+0x30/0x2d8
and
[ 44.313504] [<ffffffc000083dc0>] el1_irq+0x80/0xe4
[ 44.323414] [<ffffffc00012584c>] __setup_irq+0x318/0x3e0
[ 44.333885] [<ffffffc000125a8c>] request_threaded_irq+0xe0/0x124
[ 44.345147] [<ffffffc00040f004>] disp_sys_register_irq+0x88/0x98
[ 44.356431] [<ffffffc00041cf9c>] disp_hdmi_enable+0x1d4/0x278
[ 44.367423] [<ffffffc000410d38>] disp_device_attached_and_enable+0x1bc/0x1d4
[ 44.379876] [<ffffffc000410ef0>] bsp_disp_device_switch+0xbc/0xe4
[ 44.391253] [<ffffffc000407d04>] start_work+0x174/0x1f0
[ 44.401655] [<ffffffc0000cb784>] process_one_work+0x27c/0x42c
[ 44.412623] [<ffffffc0000cc768>] worker_thread+0x208/0x320
[ 44.423315] [<ffffffc0000d27f0>] kthread+0xb4/0xbc
[ 44.433240] kworker/1:1 S ffffffc0000853b8 0
and
45.225365] [<ffffffc0000853b8>] __switch_to+0x7c/0x88 [445/9673]
[ 45.235455] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[ 45.245628] [<ffffffc000724780>] schedule+0x74/0x7c
[ 45.255409] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[ 45.266012] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[ 45.276588] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[ 45.287325] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[ 45.297312] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[ 45.308281] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[ 45.318875] [<ffffffc0001a4d54>] migrate_prep+0x14/0x20
[ 45.328979] [<ffffffc000167d78>] alloc_contig_range+0xb8/0x26c
[ 45.339729] [<ffffffc000493884>] dma_alloc_from_contiguous+0xa4/0x12c
[ 45.351152] [<ffffffc0000928cc>] __dma_alloc_coherent+0xb0/0x118
[ 45.362088] [<ffffffc000092a00>] __dma_alloc_noncoherent+0xcc/0x158
[ 45.373319] [<ffffffc00019979c>] dma_pool_alloc+0xf0/0x1c4
[ 45.383705] [<ffffffc0004ef388>] ehci_qh_alloc+0x4c/0xc4
[ 45.393894] [<ffffffc0004f1408>] ehci_init+0x13c/0x3b8
[ 45.403875] [<ffffffc0004f16a4>] sunxi_ehci_setup+0x20/0x38
[ 45.414303] [<ffffffc0004de7a8>] usb_add_hcd+0x1c8/0x5a8
[ 45.424417] [<ffffffc0004f5560>] sunxi_insmod_ehci+0x118/0x218
[ 45.435096] [<ffffffc0004f56d8>] sunxi_usb_enable_ehci+0x78/0x88
[ 45.445982] [<ffffffc00051144c>] usb_msg_center+0x88/0x104
[ 45.456307] [<ffffffc00051057c>] usb_host_scan_thread+0x54/0x68
[ 45.467110] [<ffffffc0000d27f0>] kthread+0xb4/0xbc
and
[ 47.357995] [<ffffffc0000853b8>] __switch_to+0x7c/0x88
[ 47.368085] [<ffffffc0007244f4>] __schedule+0x4fc/0x714
[ 47.378228] [<ffffffc000724780>] schedule+0x74/0x7c
[ 47.387959] [<ffffffc000722564>] schedule_timeout+0x34/0x27c
[ 47.398562] [<ffffffc000723cbc>] wait_for_common+0x118/0x158
[ 47.409169] [<ffffffc000723d24>] wait_for_completion+0x28/0x34
[ 47.419962] [<ffffffc0000cb108>] flush_work+0xf8/0x11c
[ 47.429992] [<ffffffc0000cccd4>] schedule_on_each_cpu+0xf8/0x124
[ 47.440953] [<ffffffc00016c5f0>] lru_add_drain_all+0x1c/0x24
[ 47.451515] [<ffffffc0001e5b24>] invalidate_bdev+0x30/0x4c
[ 47.461872] [<ffffffc0002453b4>] ext4_put_super+0x264/0x2ec
[ 47.472336] [<ffffffc0001b24d8>] generic_shutdown_super+0x68/0xd4
[ 47.483396] [<ffffffc0001b27c0>] kill_block_super+0x30/0x7c
[ 47.493872] [<ffffffc0001b2b44>] deactivate_locked_super+0x44/0x74
[ 47.505016] [<ffffffc0001b2fb4>] deactivate_super+0x68/0x74
[ 47.515443] [<ffffffc0001cdbd0>] mntput_no_expire+0x158/0x168
[ 47.526039] [<ffffffc0001cef48>] SyS_umount+0x34c/0x36c
I have a rather reliable setup to reproduce this. With the new USB drivers it is less likely to trigger. I boot to initrd only (have simpleimage without rootfs). It just booted 4 times in a row without issue and then crashed twice in a row like this.
I am powering through euler and have HDMI connected (but that does not seem to matter). When i disconnect the USB Keyboard/Mouse dongle it never crashes. Also i can connect the dongler at any time later and it also does not crash.
I tested this in detail yesterday. It still can crash exactly like with even when powered at 5.2V via Euler. It never draws more than 400mA during bootup either.
I did some more tests and compiled the kernel with debug info. Looks like it's actually stuck in a softirq, but it's relatively hard to debug since the stack trace is be incomplete in this case and I'm not sure if the info I got after applying an extra patch is correct
[ 42.584359] Last softirq was rcu_process_callbacks+0x0/0x3f8
P.S. it seems that this behavior also occured on my SoPine w/ Baseboard, running mainline kernel w/ HDMI driver patched. Strange.
I am experiencing a HDMI bug too - if a HDMI cable is plugged in to the HDMI port, the A64 boots fine after a power cycle. If there is no HDMI cable, it may or may not boot.
There is nothing connected at the other end of the HDMI cable. I am running Xenial with Longsleep kernel.
Workaround: Keep a HDMI cable plugged in.
I am experiencing a HDMI bug too - if a HDMI cable is plugged in to the HDMI port, the A64 boots fine after a power cycle. If there is no HDMI cable, it may or may not boot.
There is nothing connected at the other end of the HDMI cable. I am running Xenial with Longsleep kernel.
Workaround: Keep a HDMI cable plugged in.
Most likely the HDMI cable feeds enough extra power to the device that the voltage does not drop on load. Means your power supply solution is to blame and not sufficient.
Unlikely, since there's nothing plugged in at the other end of the HDMI cable.
The power supply is the model recommeded in the Pine64 store at the time of purchase.
I don't think this is a power supply issue -- I see this happening on two of my boards (bought from separate lots) with about a 30% successful boot rate sometimes. Both boards exhibit this behavior while running off a bench supply powered through the Euler bus at as high as 6 volts (I've not risked going any higher). The crashes happen on all the images I've tried though the behavior is different on each one. Sometimes I can get things to boot more reliably on an image and it will stay about 80% reliable once it boots successfully a few times. I can post output from the serial console if there is any interest.
Well this still is an issue - so feel free to post your findings here in case someone is willing to take a detailed look. If it is HDMI related it might be an idea to get rid of this driver and all related to it.
Let me try some experiments and see what I come up with. Is there a way to turn off the HDMI driver completely? The most reliable boot image has been Android, but I've been using debian and xubuntu since I want to run a headless server with these units. I have successfully upgraded one unit to bionic beaver (haven't tried with the other one) but the /boot partition has to be enlarged for the do-release-upgrade to work (I can open another issue to cover that if you like). The bionic beaver image also exhibits this behavior.
I have the same problem. Is there any solution?
I think I may have taken care of the problem on my two boards by manually setting the monitor resolution to a valid value using the Mate desktop app. I was always seeing an error message from the HDMI driver about invalid resolution right before the boot would hang. Now that I have set the resolution value I don't see the error message anymore and my boards have been booting ok -- I THINK -- I have that caveat because my boards have been up and running continuously over last few weeks so I have not done much testing yet.
Apologies for duplicating my post on the Pine64 forum. Unfortunately I'm unable to reply further due to an anti-spam measure that they have introduced on the forums (according to my IRC conversation, as a new user I have to wait three days before I can make my second post).
For completeness I'm going to include my original text:
@longsleep kindly replied thus: