hzeller / rpi-rgb-led-matrix

Controlling up to three chains of 64x64, 32x32, 16x32 or similar RGB LED displays using Raspberry Pi GPIO
GNU General Public License v2.0
3.64k stars 1.16k forks source link

Use DPI interface as fast DMA #1218

Open B-C-Mike opened 3 years ago

B-C-Mike commented 3 years ago

DPI interface is like VGA output, just binary, all 24 bits (8 per color) have own GPIO pins. Clock speed can be set above 100Mhz with about 1khz steps.

I create prof of concept project using just python and DPI output. 100 FPS on rPi 2 for 64x32 LED panel and single core load (calculations done in NumPy). At the same time, the DPI push data to panel at stable 400hz refresh rate. Theoretical maximum for 8b color is 4khz for single panel and 270hz for 32k pixels data chain mentioned in readme.

Hope my solution helps improve Your project. https://github.com/B-C-Mike/PoC-rPi-LED-matrix

hzeller commented 3 years ago

Very nice! This looks like a very promising approach, as it solves the problems of (a) not nusing CPU (b) stable output frequency.

From reading the description it looks like you generate the bitplanes by clocking in the same row multiple times, which means we don't need the output enable pulse generator, but we're also limited by the clock-speed we can achieve on the serial line (or am I missing something ?). We probably won't reach 11 bit BCM with this, but in particular for smaller displays with lower PWM bits, this is still a good advantage given the CPU savings and not having to deal with jittery memory bus contentions.

Would you like to take a stab at a pull request demonstrating this with the current rpi-rgb-led-matrix code ? Don't mind if it is not clean, just rip out the parts not needed to have a working proof of concept. I can then help fitting it into the rest of the library.

Currently, the pixel setting is happening here; Could the /dev/fb1 be a memory map to write to even ? Though to support multiple buffers (swap on vsync and stuff), we might just have a backing buffer that we then swap or copy when needed.

B-C-Mike commented 3 years ago

I use both "send the same row mutiple times" and "send row with reduced brightness" to find balance between brightness and refresh speed.

No, i do use output enable and control it to get PWM brightness control. It's not PWM, rather "set this pin high for X pixels the turn low for rest". That gives good enough resolution and the brightness is reduced for some bitplanes. First 3 examples are limited to fixed brightness and 4 bit color depth, next have proper 8b color depth with PWM. My code is limited to 8 bit after gamma correction, but it should be easy to modify it to 16b and just trim to 11b.

Sadly i don't have skills for C++ code. Only C. I will try to make it work but not sure about the effect. Let's set the deadline as end of the year.

It is possible to mmap, not sure about the synchronization events and double buffer. Just make sure to enable fb1 before playing (config.txt). By default rPi maps all outputs to fb0 (mirroring) or doesn't map at all (black screen, just refresh pulses).

hzeller commented 3 years ago

Ah nice, yes, using the pixel clock for timing the Output Enable sounds good.

I will play with it and see to integrate it, but I have a bunch of other projects on my plate currently, so I can't make promises that it will happen in the next couple of weeks.

Thanks a lot for your research getting a chance to have a high frequency steady clock output with the Pi, this is is exciting!

B-C-Mike commented 3 years ago

OK i have to give up. No idea what i doing with that code. Here is summary of what i learned so far:

arahasya commented 3 years ago

This library is already using DPI output and I am successfully running large display matrix using it. https://github.com/rjrouquette/rgb_matrix_udp

arahasya commented 3 years ago

After going through your description looks like you are setting the DPI clock at 40 Mhz. But most of the RGB Panels have driver IC's with 25 Mhz clock. So for larger displays it will not work.

B-C-Mike commented 3 years ago

@arahasya
Thanks for link to that project. It is possible that other projects like my idea exists too, just complete. So far never found any. btw, linnked project requires atxmega microcontroller and programmer for it. That's the main difference between these two idas.

No. I don't use DPI pixel clock. I generate clock via one of data lines. 40Mhz is the reference clock (pixel clock) for DPI, not for panel. Panel is clocked at half of that speed, by switching one of data pins as fast as possible. I had problems latching panel data when clock is still running, so i stop clock, then latch. DPI clock can't be stopped, software generated can be stopped.

arahasya commented 3 years ago

@B-C-Mike just mentioned the link if that can be helpful for your implementation.

Your project looks very promising like as you said don't need an external controller. Even tough the above project enables 4 parallel chains your project will improve this current library with 3 parallel chains and make it possible to run smoothly bigger display. SO I will be looking forward to your contribution

bluelasers commented 2 years ago

We probably won't reach 11-bit BCM with this, but in particular for smaller displays with lower PWM bits, this is still a good advantage given the CPU savings and not having to deal with jittery memory bus contentions.

Why not? The LEDs are good for only 11-13 bits; however, this assumes a single scan panel. If you use a 32-scan panel you are only looking at 6-8 bits. You now use the GPU and L2 cache as a FIFO, instead of the CPU and L1 as a FIFO, if I am not mistaken. The L2 likely has priority over the L1s so this works out.

There are software algorithms which would support getting the full amount as long as there is enough bandwidth, so what changed? Now the CPU required for this is a tad higher so how much CPU you get back is not completely clear to me. However, should be some, especially for non-real time play back using multibuffering.

Meaning this library has issues with large, high-quality displays due to bit banging. However, it should support this in non-real time? Is there much that can be done about the real time performance without additional hardware? DPI should provide better stability. Is this library okay with consuming the entire header?

There is some overhead, you should be able to get most of the serial bandwidth. You could still use BCM, but you need a decent number of memory operations from CPU.