rm-hull / luma.led_matrix

Python module to drive LED Matrices & 7-segment displays (MAX7219) and RGB NeoPixels (WS2812 / APA102)
https://luma-led-matrix.readthedocs.io
MIT License
523 stars 157 forks source link

Recommended approach to reset MAX7219 devices? #150

Closed drspangle closed 5 years ago

drspangle commented 6 years ago

I'm in the process of trying to figure out why my large (>100 matrices) MAX7219 display board is encountering glitches. I've spent a lot of time experimenting with clock speeds, bus transfer rates, and I've got something very close to a stable state. Despite being stable for minutes or hours, the most common glitch is that single matrices seem to shut off and on, get stuck pixels, or change contrast for some reason, and it's not clear why. I'm fairly certain this is not a memory leak or something on the software side causing corrupted signals. That is to say, I don't think the software is the problem, but that doesn't mean that there isn't a software solution to my problem.

Resetting the display fixes the problem. I want to do some debugging to investigate what about the reset process fixes the problem, but aside from running the high level get_device() (from demo_opts), the only methods that I can find to reset the display are the max7219 class's show() and clear() methods. I'm really not sure which one of the two, or both one after the other, are responsible for resetting the device to a stable state again. It would be great if I could figure out what mechanism is actually "fixing" the devices and cut away any extra cruft that slows down this process so that it could potentially be run repeatedly with minimal latency added to the rendering of subsequent frames.

So, if I can find a way to meet this requirement, my hypothesis is that if I'm rendering frames on the display at a constant rate, I could inject the command that successfully cleans up the glitches every second (framerate+1 frames) or so and it would hopefully be fast enough prevent the glitched matrices from appearing glitched to the human eye, with minimal impact on performance in terms of overall average framerate.

I'm also analyzing the signals using a protocol analyzer and oscilloscope to see if there's some other reason why the MAX7219 modules are doing this strange behavior. From this perspective, the hypothesis is that the devices being affected by the glitch are missing a clock cycle or something similar and that's making their shift registers go out of sync from the others in the cascade. Not sure about this yet. Power reads clean on the oscilloscope, signals are 5V0 using a boost converter from the pi's 3V3 outputs, everything is wired in parallel except the DOUT to DIN pins on each module which are wired in series. I'm using PCBs with 4x MAX7219 chips each, the typical ones found on Amazon and Ebay with 8x8 LED dot matrices. There's no significant ringing or any other analog weirdness on any of the signal pulses as far as I can see. At this point there's nothing I can really change with respect to wiring, the clock speed, or the SPI MTU chunk size that seems to make the display glitch less, so I'm hoping I might be able to get closer to a sustainable solution by resetting the devices as quickly as possible on a regular interval.

drspangle commented 6 years ago

So, after a bunch of experimentation, it seems that the show(), hide() and clear() methods don't solve my problem. Something else getting called (perhaps during the destruction of the previous class instantiated by get_device()?) is resetting the devices and cleaning up the glitch. I'm hoping it's something that can be done rapidly because it seems that the show(), hide() and clear() methods introduce a non-trivial delay in rendering, but if they're not the necessary part of the get_device() call then perhaps my hypothesis that the display could be reset on a regular interval to fix the glitches is still correct.

thijstriemstra commented 6 years ago

ping @rm-hull

drspangle commented 6 years ago

@thijstriemstra @rm-hull This is still ongoing. I don't think this is a software issue, I think there's actually a signal degradation problem with the MAX7219 devices arranged in cascades.

I'm still in the process of developing a PCB that can be used to strap together 4x4 modules so that they'll work reliably. I'm learning that PCB design takes a lot of time and effort, though, so it might not be reasonable to expect a resolution in the very short term.

drspangle commented 6 years ago

Oh, also, my hypothesis about resetting the displays rapidly turned out to be a bust. It takes too long to maintain a steady framerate, and the glitches I'm encountering can happen often enough that the display will go noticeably wonky before the refresh can fix the issue (even if you're doing it as often as every second, or even every few frames).

drspangle commented 6 years ago

I'm still waiting for the PCBs I developed to fix this problem to come back from fabrication. I'll update at that time - it'll be a couple weeks probably.

Overall, there's no software bug that I can see. If you're less concerned about the hardware solution to this signal degradation problem (as previously discussed in #108), you can close this issue now.

thijstriemstra commented 6 years ago

you can close this issue now.

no rush, it's your ticket now ;)

isjerryxiao commented 6 years ago

I also have such problem with my 32x24 max7219 device. I managed to solve the issue by using a AMS1117 to provide 3.3v vcc.

drspangle commented 6 years ago

Surely you must have meant 5v?

On Sat, Jun 2, 2018, 7:56 AM Jerry Xiao notifications@github.com wrote:

I also have such problem with my 32x24 max7219 device. I managed to solve the issue by using a AMS1117 to provide 3.3v vcc.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rm-hull/luma.led_matrix/issues/150#issuecomment-394081917, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnES20qkvTaVgoU1xrkanNQcCi5hUfYks5t4n1fgaJpZM4SQcR7 .

isjerryxiao commented 6 years ago

@drspangle 3.3v same voltage as gpio

drspangle commented 5 years ago

I had forgotten about this but the closed notification reminded me about it.

Turns out that the reason why the cascades were screwing up is parasitic capacitance from using dupont connectors. Once I used a PCB (with no special hardware on it) to connect all the cascaded modules, I was able to build a 32x160 matrix which works a treat. I'll be open sourcing the design for the display pretty soon.

rm-hull commented 5 years ago

32x160 matrix ... so that 4x20 units - nice!
Would love to see a video of that in action!

Yeah, i closed based on what you said in https://github.com/rm-hull/luma.led_matrix/issues/150#issuecomment-393332793.

Good to know what the cause was. I'm not an electrical engineer, but of course it had to be something like that.

drspangle commented 5 years ago

Check it out... 20181205_182537

I'm pretty confident that you could daisy chain these pretty much ad infinitum. This huge 5x 4x4 matrix array was exactly what I was trying to do.