martinberlin / pixels

Simple library for controlling LED chains
GNU General Public License v3.0
0 stars 0 forks source link

Try to optimize refresh speed #1

Open martinberlin opened 4 years ago

martinberlin commented 4 years ago

After the indications of Makuna ( Michael )

Just some nuances, your struct pixels, it is really the same as RgbColor and or RgbwColor. If you switch to use it, then your setpixelColor would just be...

strip.SetPixelColor(i, pixels[i]);

A way to make this generic is to typedef it like following (of course, ifdef based on Rgb or rgbw

typedef RgbColor pixel;

While compiler maybe smart enough, often it is not; the following In

    result[i].R = pyld[5+(i*4)];
    result[i].G = pyld[5+(i*4)+1];
    result[i].B = pyld[5+(i*4)+2];

might be better written as

    pixels* result = new pixel[cnt];
    uint8_t* resultPtr = (uint8_t*)result;
    memcpy(resultPtr, pyId, sizeof(pixel) * cnt);

Martin the Creator @martinfasani_twitter 09:50 Thanks Michael, that is a wise advice! Will probably do it that way. After measuring Frames in a 4 minute video, removing the javascript reverse array, will gain only 3% speed. So it's not a huge gain, but anyways will like to give it a try, since maybe javascript then has more room to focus in Zlib/Brotli compression (We are using WebAssembly to compress in the WebView)

Michael Miller @Makuna 09:56 Another minor thing, what LEDs are you actually using? Ws2812 and Ws2812x have a nuance in the protocol, the x versions require a long reset signal (300us) while the other only needs around 50us (or 80us for sk6812). This is why I expose the different models, if it still works for you, you might eeck out a little more perf by changing to one that has the smaller reset signal. Further, allocating the pixels array every packet and releasing isn't great either. Either do it once and re-use it; OR just point it directly into the pyId packet and never allocate at all. that will get rid of at least one mem alloc and copy.

I'm actually using in this led matrixes Ws2812B was not aware of that at all. CC @iotpanic

Makuna commented 4 years ago

Try this instead, remove the allocation for result, remove the delete of that allocated memory, and just return the pointer directly

return (RgbColor*)(pyld+5);

I suspect the difference might be the overhead of the call into memcpy and special checks in it for the different memory types that the esp8266 has.

martinberlin commented 4 years ago

Thanks Michael, That did the trick. Still need to run tests 10 times and get averages. But when I disable the array.reverse lines in Android side and let javacript just compress and send but no any array reverse work, this optimized take has the winning side:

Compressed zlib, sending 968 RGB pix 5137 Optimized 5232 No optimize -equal - this is just coincidence

Disabling mirroring lines and sending "as is" Frames total/ Avg FPS 5564 / AVG 28 Optimized - NO reverse lines

5215 / AVG 26 -> Non optimized (For...loop is loosing) 2 FPS more per second is already something!

In total received 349 Frames more in a 3:11 minutes video. Bottom-line of our Framerate (FPS) To send one bit 0 takes 1,2 us (microsecs) and to send a bit 1 takes 1,25 us sending at the end a reset that as at least 50 us. Calculating this, sending 24 bits for a pixels takey 28,8 (241,2) and for 484 (2222 matrix) takes 13989 micro seconds (28,8*484+50) Mentioned by Michael in chat also confirmed by father @carlosfasani calculations

That 14 ms is the baseline for our framerate. Considering I'm sending to 968 (44*22) that means that no matter how optimized it is will take 28ms to push the data. It pushes and the program keeps on, but won't push again unless the first push is finished. That gives a bottomline that is 1000/28 = 35 FPS minus network and processing stays on 27/28 FPS