Closed gregtap closed 7 years ago
Thanks for the test, that is helpful in working on the Performance.
In general, this libraries' first focus is the C++ API, the Python API uses it as a backend.
Python is generally slower, but this looks very slow, so it might be interesting to see if that can be improved. Maybe @Saij who has contributed the Python API can help figuring out what the slow part is.
Maybe if wrapping C++ code is too much overhead with whatever the binding code generator does: I have recently added a simple C-API ( https://github.com/hzeller/rpi-rgb-led-matrix/blob/master/include/led-matrix-c.h ) which might be easier to wrap (probably doesn't even need the code generator but just plain manual Python C-binding).
@coulix Maybe you want to figure out faster Python bindings ? @Saij contributed the original Python binding, but he might not have the time. Alternatively, just using C++ is probably a good choice anyway.
Interesting, I will try to poke around with ctypes manual binding and see where it brings us. 👍
I am probably doing it wrong:
from ctypes import cdll
ledmatrix_lib = cdll.LoadLibrary('/home/pi/display/rpi-rgb-led-matrix/lib/librgbmatrix.so.1')
sudo python foo.py
Traceback (most recent call last):
File "foo.py", line 2, in <module>
ledmatrix_lib = cdll.LoadLibrary('/home/pi/display/rpi-rgb-led-matrix/lib/librgbmatrix.so.1')
File "/usr/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/__init__.py", line 365, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/pi/display/rpi-rgb-led-matrix/lib/librgbmatrix.so.1: undefined symbol: _ZTVN10__cxxabiv117__class_type_infoE
mmh, maybe it got lost when loading a c++ symbol. I have now changed the linking of that symbol. Can you sync to the latest version in git, make clean
and make
and try again ?
Relevant commit that should fix this was https://github.com/hzeller/rpi-rgb-led-matrix/commit/744578c8bd5b8732194db66e95ae8f0be9b542e1
Great the loading went fine. Now I need to figure out how to cast my canvas as a struct LedCanvas *canvas
.
from ctypes import cdll
from ctypes import c_ushort
from ctypes import c_int
from ctypes import c_voidp
# [ .... ]
# Buffer canvas.
offsetCanvas = matrix.CreateFrameCanvas()
from numpy import random
import time
colors = random.randint(255, size=(1024, 3))
while True:
for x in range(0, 1024):
color = colors[random.randint(1024)]
ledmatrix_lib.led_canvas_set_pixel(c_voidp(offsetCanvas), c_int(x%32), c_int(x/32), c_ushort(color[0]), c_ushort(color[1]), c_ushort(color[2]))
#offsetCanvas.SetPixel(x%32, x/32, color[0], color[1], color[2])
offsetCanvas = matrix.SwapOnVSync(offsetCanvas)
offsetCanvas is not a void pointer, I was just trying something.
From Python ctype documentation it seems that we need to declare a ctype struct looking like:
class SMB_REQUEST(ctypes.Structure):
_fields_ = [("Address", ctypes.c_ubyte),
("Command", ctypes.c_ubyte),
("BlockLength", ctypes.c_ubyte),
("Data", ctypes.c_char * SMB_MAX_DATA_SIZE)]```
Where can I find the `LedCanvas` stuct details?
My first poke at ctypes, it's getting hairy 👯
LedCanvas is intentionally an opaque type, and Python doesn't need the details for the struct to use any of the functionality as you only have to pass around a pointer (it would be different it it was passed as value). It is a common way in C-APIs to abstract away details, and from what I heard, it should be fairly easy to wrap with Python.
For all intends and purposes, you can treat it as void-pointer. The only reason why it has a type of struct LedCanvas or struct RGBLedMatrix is, that it cannot be accidentally passed to the wrong method.
I guess some useful search terms would be 'opaque struct pointer' for Python bindings.
How did you do the mapping ? I notice that your names are something like matrix.CreateFrameCanvas()
, while the c-function (in include/led-matrix-c.h
) for that would be
struct LedCanvas *led_matrix_create_offscreen_canvas(struct RGBLedMatrix *matrix);
. So I would've expected that the function call in Python would be similar - or did you already make a mapping layer around this not shown in the code above ?
I only did change offsetCanvas.SetPixel
thinking that the casting between matrix.CreateFrameCanvas()
and what C expects will auto magically work. Back to the drawing board.
I made it!
from ctypes import cdll
from ctypes import c_ushort
from ctypes import c_int
from ctypes import c_void_p
import sys
import os
from numpy import random
import time
sys.path.append(os.path.abspath(os.path.dirname(__file__) + './..'))
from rgbmatrix import RGBMatrix
ledmatrix_lib = cdll.LoadLibrary('/home/pi/display/rpi-rgb-led-matrix/lib/librgbmatrix.so.1')
# 32 Rows, 1 panel.
matrix = ledmatrix_lib.led_matrix_create(c_int(32), c_int(1), c_int(1))
# Buffer canvas.
offsetCanvas = ledmatrix_lib.led_matrix_create_offscreen_canvas(c_void_p(matrix))
colors = random.randint(255, size=(1024, 3))
while True:
for x in range(0, 1024):
color = colors[random.randint(1024)]
ledmatrix_lib.led_canvas_set_pixel(c_void_p(offsetCanvas), c_int(x%32), c_int(x/32), c_ushort(color[0]), c_ushort(color[1]), c_ushort(color[2]))
offsetCanvas = ledmatrix_lib.led_matrix_swap_on_vsync(c_void_p(matrix), c_void_p(offsetCanvas))
And its as slow :/
Maybe having led_canvas_set_pixelS
where we pass a bytearray could help.
I think it's slow because the way the Python objects are converted to C. That'll be the problem. And Cython produces also a good overhead. Maybe writing a small library directly in C to do the conversion between Python and C by yourself would be the answer. You don't have to mangle with Offscreen canvas in Python to draw. Just write some functions for what you want to do and then do the whole thing in C
I think we're on a good path here at least: with the c-api, we can have a much lighter interfacing logic to Python without any huge generated Cython blob, which is good.
Regarding the speed: Don't know, but I'd probalbly make things in Python in a way that is already a uchar or something so that no conversion is needed (e.g. fill a bytearray with random colors and use that directly, becuse it should translate to bytes ?)
No idea, have never done anything with Python, but looks like you guys are onto something!
The problem is, that every variable in Python is an object in C (it is always a PyObject pointer) Here is the link to the page: https://docs.python.org/2/extending/extending.html#providing-a-c-api-for-an-extension-module
Interesting, to test I will try to add a led_canvas_set_pixels
with a bytearray and see if it improves fps a bit.
So, weekend is coming up, so just in case you guys are bored, I encourage you to attempt if some performance improvement is possible :)
I did a lot of experiments with Python and these panels. In my experience Python just was not fast enough to get a good, stable, high frame-rate image with anything over a few bits per pixel for color. It was decent at displaying a static image on a panel but not much else. This is not to say it cannot be done, but you have your work cut out for you! Good luck!
One idea I had a while ago was a simple Unix Socket Service implemented in C which then could be controlled with various scripting language without the need of bindings. Or to implement a kernel module (thanks to the C-API it should now be a bit more easier ^.^)
I now got around playing with the code a bit to figure out what is slow to see what could be improved.
So I was playing with the program in your initial post @coulix and replace the while True
with a for t in range(0, 10000):
to have a baseline, because then I can just measure the total execution time.
$ time sudo python ./rand-dots.py
real 1m27.921s
user 1m59.350s
sys 0m3.390s
The whole script took about 87.9 seconds to run for 10000 frames, which is a frame rate of > 113 fps, which is much higher than the 2-3fps you were seeing.
What machine you were running this on ? This is a Raspberry Pi 3 with Raspbian GNU/Linux 8 (jessie). Python was 2.7.
Given that your experience is like 30x slower, are you perhaps using some kind of debug binary of Python @coulix ?
Measured also with python3: it is slower (~76 fps), but still faster than 3fps.
This is odd, It was a Rasberry Pi 2. I have some free time this week, I will double check those timings.
I checked its a B+ 512 mb, so the last revision of the first generation. It is still very slow => 19s for 10 frames
.
I doubled checked and no the Python is not being executed in debug mode. I am going to try that on a new raspberry 3 I have around now to compare.
Interesting. I will have to look if I find an old Pi to compare; I was mostly using the newer Pi's the last year or so.
If we can narrow it down to old Pi's, we there are special compiler flags when compilng the Python binding that might be useful for the somewhat older ARM architecture.
Does your B+ have a current operating system, so that we can compare the same version of system libraries and compilers ?
Ok now I tried on raspberry 2 Model B (I thought I had the 3) with the same SD card and adafruit pi clobbler+ but the wiring must be different because it shows static colorful patterns and does not animate. Frame rate is a bit better, 2s for 10 frames, still something is really off. I am restarting from scratch now.
I pulled latest changes form master and rebuilt.
Linux Raspberypi 4.4.1-v7+ #888 SMP
Are you overclocking the RPi? I have noticed that overclocking really messes with the timings and therefore when driving a matrix LED display I never overclock.
On Sat, Sep 3, 2016 at 12:44 PM, coulix notifications@github.com wrote:
Ok now I tried on raspberry 2 Model B (I thought I had the 3) with the same SD card and adafruit pi clobbler+ but the wiring must be different because it shows static colorful patterns and does not animate. Frame rate is a bit better, 2s for 10 frames, still something is really off. I am restarting from scratch now.
I pulled latest changes form master and rebuilt. Linux Raspberypi 4.4.1-v7+ #888 SMP
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hzeller/rpi-rgb-led-matrix/issues/159#issuecomment-244560008, or mute the thread https://github.com/notifications/unsubscribe-auth/ABONpqGv2Rc3L2GkaB42O8WfgcR_A9pVks5qmbHtgaJpZM4I2meM .
For better comparison, I've now made a very simple benchmark in c++ and Pyton
C++
// compile with
// g++ -Iinclude -o speed-test speed-test.cc -Llib -lrgbmatrix -lpthread
#include "led-matrix.h"
#include <unistd.h>
#include <stdio.h>
#include <sys/time.h>
using rgb_matrix::RGBMatrix;
typedef int64_t tmillis_t;
static tmillis_t GetTimeInMillis() {
struct timeval tp;
gettimeofday(&tp, NULL);
return tp.tv_sec * 1000 + tp.tv_usec / 1000;
}
int main(int argc, char *argv[]) {
RGBMatrix *canvas = rgb_matrix::CreateMatrixFromFlags(&argc, &argv, NULL);
if (canvas == NULL)
return 1;
const int w = canvas->width();
const int h = canvas->height();
const int loops = 1024;
const tmillis_t start = GetTimeInMillis();
for (int i = 0; i < loops; ++i) {
const uint8_t col = i & 0xff;
for (int y = 0; y < h; ++y) {
for (int x = 0; x < w; ++x) {
canvas->SetPixel(x, y, col, 0, 0);
}
}
}
const tmillis_t duration = GetTimeInMillis() - start;
const int pixels = w * h * loops;
const float pixels_per_sec = 1000.0 * pixels / duration;
printf("%d pixels, %lldms; %.1f Megapixels/s; %.1fHz frame update rate\n",
pixels, duration, pixels_per_sec / 1e6,
1000.0 * loops / duration);
canvas->Clear();
delete canvas;
return 0;
}
And Python
from rgbmatrix import RGBMatrix
import time
canvas = RGBMatrix(32, 1, 1)
w = canvas.width
h = canvas.height
loops = 1024
start = time.time();
for i in range(0, loops):
col = i % 256;
for y in range(0, h):
for x in range(0, w):
canvas.SetPixel(x, y, col, 0, 0);
duration = time.time() - start
pixels = w * h * loops;
pixels_per_sec = pixels / duration;
print("%d pixels, %dms; %.1f Megapixels/s; %.1fHz frame update rate"
% (pixels, 1000.0 * duration, pixels_per_sec / 1e6,
loops / duration))
The speed difference is only about factor 5 difference between C++ and Python on the Pi 3
$ sudo ./speed-test
1048576 pixels, 397ms; 2.6 Megapixels/s; 2579.3Hz frame update rate
$ sudo ./speed-test.py
1048576 pixels, 2254ms; 0.5 Megapixels/s; 454.3Hz frame update rate
Versions:
$ uname -a
Linux nope 4.1.19-v7+ #853 SMP Wed Mar 9 18:09:16 GMT 2016 armv7l GNU/Linux
$ gcc -v
[...]
gcc version 4.9.2 (Raspbian 4.9.2-10)
Ok, now tested on an old Raspberry Pi 1
$ sudo ./speed-test
1048576 pixels, 3624ms; 0.3 Megapixels/s; 282.6Hz frame update rate
$ sudo ./speed-test.py
1048576 pixels, 60604ms; 0.017 Megapixels/s; 16.9Hz frame update rate
So the Pi 1 is about 1/9 the speed if comparing the corresponding Pi 3 c++ program. And it is 1/26 the speed comparing the corresponding Pi 3 Python program; so relative much slower even.
So the Pi1 is not only much slower, the relative speed of Python vs c++ is also worse: while on the Pi 3 the speed factor difference is 5:1 (c++:python), it is even more on the Pi1 with 17:1.
So bottom line: it is not really advisable to use Python on the older Pis as the speed is just abysmal. On a Pi3, the speed of Python seems to be usable (and probably on a Pi 2, as it is just like a Pi3 but clocked a little slower).
Versions:
$ uname -a
Linux mypi 4.1.7+ #817 PREEMPT Sat Sep 19 15:25:36 BST 2015 armv6l GNU/Linux
$ gcc -v
[...]
gcc version 4.6.3 (Debian 4.6.3-14+rpi1)
I have added a section in the Python readme with the results of these tests https://github.com/hzeller/rpi-rgb-led-matrix/tree/master/python#speed
i wonder what happens if you run it with pypy, also that would require a cffi implementation of the library. but my guess is that it would give performance a significant boost.
Interesting. So I tried compiling the cython interface for pypy
sudo make install-python PYTHON=$(which pypy)
(first, I had to comment out the SetImage()
implementation in python/rgbmatrix/core.pyx
and the from PIL import Image
as there does not seem to be a pillow implementation for pypy to be installed on debian; then in python/rgbmatrix
once make
to build the new cython files without the SetImage() support)
With that, I can report, it is even 20x slower than with Python2 on a Raspberry Pi 3 ( 0.020 Megapixels/s, so more than 160x slower than using c++ directly). You can literally watch the pixel filling the screen. This might be due to very inefficient binding of cython to pypy.
So I suspect if someone wants to try that, a CFFI implementation needs to be done first as mentioned by @Duality4Y (Possibly using the simpler C interface provided in include/led-matrix-c.h
)
Given that it is so slow comparable to your original post, I am wondering if you are using pypy in your tests @coulix ?
You can double-check the wiring looking at the wiring diagram @coulix . There was a change about a year ago in the wiring; not sure if your other version was doing that.
hi, i have been working on the cffi thing. https://github.com/Duality4Y/rpirgbmatrix-cffi
i am able to load the bindings! which is good, still trying to figure out how to pass things around and use them. but it's a start :)
ok i have most things implemented now! i seem to have stumbled upon a bug or two that i am trying to fix.
err, those bugs are fixed, i have the basic functions working now :)
This is very cool. What performance are you getting with that set-up using the simple benchmark from above ?
ok here is a run of 3 tests:
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy performance_test.py
4194304 pixels, 3950ms; 1.1 MegaPixels/s; 1036.7Hz frame update rate
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy performance_test.py
4194304 pixels, 3747ms; 1.1 MegaPixels/s; 1093.0Hz frame update rate
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy performance_test.py
4194304 pixels, 3884ms; 1.1 MegaPixels/s; 1054.4Hz frame update rate
they are run on a raspberry pi2 that is not overclocked at all.
with:
#!/usr/bin/env python
from rgbmatrix import Canvas
from rgbmatrix import Matrix
import time
matrix = Matrix(32, 1, 1)
canvas = Canvas(matrix)
w, h = canvas.get_size()
# i think better in powers of two :)
loops = 2 ** 12
start = time.time()
for i in range(0, loops):
col = i % 0xFF
for y in range(0, h):
for x in range(0, w):
canvas.set_pixel(x, y, col, 0, 0)
duration = time.time() - start
pixels = w * h * loops
pixels_per_sec = pixels / duration
print("%d pixels, %dms; %.1f MegaPixels/s; %.1fHz frame update rate" %
(pixels, 1000.0 * duration, pixels_per_sec / 1e6, loops / duration))
matrix.close()
This sounds very promising. Does this also work with non-pypy Python ? Then I am happy to accept a pull request that does that. Also, it looks a hell lot of simpler.
We probably need to make sure that there are some compatiblity features set that allow old users' code to work directly (essentially that all the python/samples/*.py work). So a rename Matrix
-> RGBMatrix
for instance or making sure that calling the matrix also works as Canvas right away.
I do like the addition get_size() that returns a tuple (but having a width and height property is probably still needed for backward compatibility). The explicit matrix.close() is good to have. Can we make sure that it is also called when the Matrix gets garbage collected ? (if someone forgets to call close, there might be some bright LEDs stay on).
it's just a test :) but specify python versions? does the one in repo right now work with 3 or 2 or both ? could be made to totally resemble the code as in the repo right now.
so here it runs with pypy:
(pypyvenv)pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ pypy --version
Python 2.7.10 (c95650101a99, Sep 06 2016, 11:02:19)
[PyPy 5.4.1 with GCC 4.7.2 20120731 (prerelease)]
(pypyvenv)pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy performance_test.py
4194304 pixels, 3686ms; 1.1 MegaPixels/s; 1111.2Hz frame update rate
and here in python2.7:
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ python --version
Python 2.7.9
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo python performance_test.py
4194304 pixels, 46299ms; 0.1 MegaPixels/s; 88.5Hz frame update rate
one thing to remember cffi bindings aren't faster in just cpython, but they are in pypy because pypy can optimize with them.
Yes, the current code works with Python 2 and 3.
here is a test with python3:
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo python3.4 performance_test.py
4194304 pixels, 46111ms; 0.1 MegaPixels/s; 88.8Hz frame update rate
so as it is now, works on: python3 python2 and pypy not planning on doing a pull request, maybe you could add it as a module, or reference it from the readme.
I have now mentioned your experiment in the python/README.md @Duality4Y .
BTW, yesterday I changed the comments in the led-matrix-c.h from //
to /* ... */
which is more compatible when compiled with a C89 compatible C compiler. However, it might mess with your 'extracting the prototypes and remove all comments' part of the script.
I also added a function that can create a matrix with settings from the command line, not sure if it is possible to integrate that in the Python program or if argc/argv are sufficiently 'hidden'. At least, there is an struct RGBLedMatrixOptions, which contains all configuration parameters. Maybe that can be made named parameters in the Python constructor ?
yes it does indeed, but that is not hard to work around though.
argv and argc are easily accessible in python. adding the options shouldn't be to hard to do. :) (cffi has been surprisingly easy to work with)
here is a test for the pypy's version of python3:
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy3 --version
Python 3.3.5 (40497617ae91, May 30 2016, 08:55:54)
[PyPy 5.2.0-alpha0 with GCC 4.7.2 20120731 (prerelease)]
pi@megamatrix:~/Duality/rpirgbmatrix-cffi $ sudo pypy3 performance_test.py
16777216 pixels, 18331ms; 0.9 MegaPixels/s; 893.8Hz frame update rate
Interesting, so like in the interpreted Python2.7 vs Python3, we also see here that the Python3 variant is slightly slower.
yes indeed, but that is because the pypy team is first developing pypy3 for correctness, and then after they are going to look at performance.
Makes sense.
Thanks to @chmullig, the SetImage() implementation is now fast as well: https://github.com/hzeller/rpi-rgb-led-matrix/commit/6b6d273ec1d3f9883b41f31b3ea51d0969678bbe
Hello there,
First, Great work! We can finally start playing with Led panels without much electronic knowledge.
I love Python and did an experiment where I run random colors for each pixels as fast possible calling
SetPixel
.I get the expected output at low framerate ~2-3FPS.
Then I tried the following C++ code (noob level).