plan44 / plan44-feed

OpenWrt feed containing plan44 packages
29 stars 9 forks source link

led-chain – Flickering with Omega2 and WS2812 #3

Closed bennigraf closed 5 years ago

bennigraf commented 5 years ago

Hey Lukas,

first off, thanks for open sourcing all this stuff!

I'm trying to run some ws2812 LEDs with a Onion Omega2, but have some trouble controlling them. Especially when writing data to /dev/ledchain0 at higher framerates (> 2fps), I get a lot of flickering/flashes (apparently wrong data pushed to the LEDs, since they keep the wrong color until the next update). Often they're at the correct color but fully turned on instead of dimmed, sometimes color's completely off.

I'm trying to write to that file via python right now, but the same issues happen when using echo -en … directly from CLI.

Strangely the last LED always lights up correctly – no matter how many LEDs I use in total. I tested with between 1 and 30 LEDs on two different strands.

Is that a known problem with the apparently demanding timing of the 2812 LEDs or could there be a different issue with how I use the driver? Do you have any ideas or maybe encountered a similar situation?

I'm using kmod-p44-ledchain_4.4.61+0.9-2_mipsel_24kc that I downloaded from plan44.ch (found a link in some forum thread...)

I'd appreciate any ideas on how to debug this. Best, Benjamin.

plan44 commented 5 years ago

Hi Benjamin,

if you are using more than one ledchain device, then the version you are using is definitely outdated - I found and fixed a bug in the interrupt handling that causes massive flickering when more than one chain is used at the same time. But when using only a single chain, I am not aware of a bug that would cause these problems.

How did you connect the chain? Directly or via a level shifter? Without level shifter, flickering or not is a matter of luck and tiny differences in supply voltage (I had a case like: 4.99V works, 5.03V it doesn't). I really recommend a level shifter.

Have you checked the driver's statistics (cat /dev/ledchain0)? Does it show a lot of errors or overruns? Basically, 2fps should be no problem at all. I have an application with chains of 518 LEDs (ws2813, though) which works up to 50fps.

BTW: Onion has includes p44-ledchain in their Omega2 PRO firmware, and now that OpenWrt v18.06 is also available for the other Omegas, I guess p44-ledchain should be there as well.

Lukas

bennigraf commented 5 years ago

Hey,

thanks for catching up so fast.

I always only used one actual "device" (as in /dev/ledchainN) with only one LED strip connected with 1-30 LEDs. Sorry for the confusion.

Thanks for pointing out the new release – I did upgrade the firmware when starting with this project, but had to google around a bit to find out that I needed to upgrade to latest and would then get v18.06. p44-ledchain is indeed in /lib/modules for that new 4.14 kernel. Nice! (Only that by default it seems to load /dev/ledchain3 which isn't available on a non-plus Omega2, so I had to rmmod and insmod again with the correct device).

I tried that, but still no luck unfortunately:

The statistics show a lot of "retries" and some errors, but I don't know how to interpret these numbers:

root@Omega-4CED:~# cat /dev/ledchain0 
Ready
Last update: 1 repeats, last timeout=10001 nS, max irq=5847 nS
Totals: updates=6556, overruns=0, retries=3195, errors=13, irqs=19628

I tried it both without and with a level shifter and even with running the LED strip directly on 3v3 (don't know if that should work, but at least logic level shouldn't be a problem then ;-) ). Funny enough it's always the same strange behaviour.

I'll still order an actual 74AHCT125 that you suggested in some other forum thread, since you mentioned that other shifters sometimes cannot cope with those HF signals. Let's see if that helps…

Thanks again, Benjamin.

plan44 commented 5 years ago

Hi Benjamin,

I dug in my LED box to find some old WS2812 strips to do some experiments myself.

It turns out that the assumption about the max pause time between bits a WS2812 would not yet interpret as a chain reset can be lower than what the driver assumed (10µS). When this happens, the update flickers badly, because the chain resets in the middle of the update.

To fix that, I added an optional parameter maxTpassive, which can set a different maximal bit pause time than the default for the LED type. I got my stone age WS2812 chain working with maxTpassive=5100(nS).

Have a look at the updated README.md, it explains the new settings and also the meaning of the statistics info.

Lukas

plan44 commented 5 years ago

BTW: you can find a built version here

bennigraf commented 5 years ago

Hey Lukas,

thanks a lot for that huge effort of digging into this!

I tried to load the module you provided directly (from the link above). insmod works, but the omega crashes when I try to write data to /dev/ledchain…. I guess it's because of conflicting kernels:

root@Omega-4CED:~# uname -a
Linux Omega-4CED 4.14.81 #0 Thu Feb 21 20:59:23 2019 mips GNU/Linux

…while your module build apparently requires 4.14.95. (I used --force-depends when installing the package…)

I'm not sure how to go on from here… – Is there a way for me to upgrade my Omega2 to 4.14.95? It seems even the official Omega build environment (https://github.com/OnionIoT/source/) is still at 4.14.81 – I wonder how you went to .95 in the first place ;-) – If I installed their build environment, would I be able to build your module against 4.14.81? – Or do you happen to have a build for 4.14.81 laying around as well?

Btw when getting device info right after insmod before writing data to it, I get strange data:

root@Omega-4CED:~# cat /dev/ledchain0 
Ready
Last update: 0 retries, last timeout=0 nS, min..max irq=0..0 nS
Totals: updates=2177551360, overruns=2206473472, retries=2177551360, errors=2177681792, irqs=0

Best, Benjamin.

plan44 commented 5 years ago

hmm. strange. I don't think it's the kernel version, I've tested it yesterday on a 4.14.63 build with --force-depends. Do you see something related (maybe the cause of the crash) when typing dmesg?

About where 4.14.95 comes from - I'm not using Onion's firmware, but my own OpenWrt builds, and these are now on OpenWrt v18.06.2 with 4.14.95 kernel (whereas Onion is currently on v18.06.1 AFAIK).

bennigraf commented 5 years ago

Hey,

dmesg doesn't show something meaningful to me. (I can only run it after the Omega rebooted, since it becomes unresponsive instantly when running echo -en '\xFF\x00\x00\x00\xFF\x00' > /dev/ledchain0.) I uploaded it here.

Running logread -f in a parallel screen session also doesn't show any messages before becoming unresponsive.

It does show the following when doing insmod …, but that seems allright to me:

root@Omega-4CED:~# logread -f
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.675021] ledchain: pwm_base=0xB0005000
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.679393] ledchain: v2 - Device: /dev/ledchain0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.684463] ledchain: - PWM channel    : 0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.688620] ledchain: - PWM buffer size: 132
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.692987] ledchain: - Number of LEDs : 10
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.697226] ledchain: - Inverted       : 0
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.701388] ledchain: - LED type       : WS2812
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.705978] ledchain: - Max retries    : 3
Tue Mar  5 19:48:24 2019 kern.info kernel: [  398.710139] ledchain: - Max Tpassive   : 5100 nS

I don't know if there are any other logs available on the Omega. Next thing would probably be to try to read the serial/uart console somehow, but I'll only get to try that next weekend or so.

Uninstalling your package and re-installing the original one from the onion sources makes the ledchain device basically work again, but with these flashes/false updates.

bennigraf commented 5 years ago

Hello again,

Today I managed to connect to serial console and read it's output. There is an error happening when doing echo … after loading your v2 module: Unhandled kernel unaligned access[#1]…

I uploaded the full output here: https://gist.github.com/bennigraf/3d4207e0178f1b48b9c9152a11b55f6f

In the call trace it points to update_leds, but I don't really know how to read this output. Can you get some information out of this?

Best and thanks, Benjamin.

plan44 commented 5 years ago

Thanks for the console log.

This looks very much like a memory corruption issue. Especially I remembered your earlier observation about the strange data readouts when doing cat /dev/ledchain0.

I looked into the code and I see no way how these variables could be anything but zero right after initialisation. The device struct where they reside is allocated with kzalloc(), which zeroes the entire area. So getting anything but zero for these statistics before doing any write to /dev/ledchain0 means that the memory area gets overwritten by something else.

I see no way how p44-ledchain's own code could be doing that. I also have a hard time to believe that this could be a specific kernel dependency on 4.14.81. I rather suspect that the current onion kernel (or libaries with direct physical memory access in the onion fw) contains something that interferes with p44-ledchain.

So to narrow down the problem:

bennigraf commented 5 years ago

Hello again,

TLDR: I've built an ipk of the updated version of plan44/p44-ledchain (which provides kmod-p44-ledchain) matching the current 4.14.81 Kernel using the OnionIOT/source repo on the openwrt-18.06 branch. I've successfully loaded this module on both a Omega2 and a Omega2+ and used it to drive a WS2813 and a (previously horribly flickering) WS2812 LED chain. Here's the ipk, in case someone else needs it: kmod-p44-ledchain_4.14.81+2.0-7_mipsel_24kc.ipk.zip

Regarding your questions:

Since you went out of your way to provide an updated version of p44-ledchain that successfully drives my shabby WS2812 and I have a working build of it now, we can close this issue in my opinion. But I'll leave it up to you, in case you have further questions or want to dig deeper into this issue above.

Best and thanks a lot, Benjamin.

plan44 commented 5 years ago

Thanks a lot for your work! And thanks for providing the build for current OnionOS!

I think we can safely assume now that p44-ledchain code is ok, but running a kmod with another kernel version than the one it was built for is not. Not that we didn't know that already ;-)

But still, I was surprised aboout such strange side effects. Probably I was misguided by thinking only about the kernel APIs as such (none of those used by p44-ledchain have changed from 4.4.61 to 4.14.81 to 4.14.95). However, what is very likely to cause memory corruptions is when a kernel data struct changes its size, as kmod code uses sizeof() and offsetof() a lot. In particular, a change in size of struct cdev would obviously kill p44-ledchain. Now, this apparently hasn't happened between my 4.4.61 and 4.14.95 builds, but something is different in the onion 4.4.81 build. Conclusion: most probably it's not the kernel version number, but a different set of kernel build options betwen my mostly vanilla OpenWrt builds and the current Onion build.

Lesson learned!

I agree that we can close this issue now - thanks to your verification.