pixelmatix / SmartMatrix

SmartMatrix Library for Teensy 3, Teensy 4, and ESP32
http://docs.pixelmatix.com/SmartMatrix
612 stars 161 forks source link

Major Refactoring - SmartMatrix 3.x #25

Closed embedded-creations closed 8 years ago

embedded-creations commented 9 years ago

https://github.com/pixelmatix/SmartMatrix/tree/sm3.0

I'm working on a major revision to the SmartMatrix Library that simplifies the SmartMatrix Class into just refreshing the display, and moves the foreground and background layers into separate Layer classes. Instead of using a unique header file to describe the hardware, the configuration is set in the user's sketch, so you can have one version of the library and use it with different sizes of displays. You can create your own Layer outside of the library - for example I made a Fadecandy Layer class that I used to convert the buffers from a port of Fadecandy to SmartMatrix - and you can make your own stackup of layers, with multiple Foreground layers for example.

This requires a lot more code to set up the classes in the sketch, which I was trying to minimize with the initial release of the library, but I think this more flexible approach supports more use cases. The code is hidden away by using a couple #defines that support the most common use cases, and we can add more. I chose to declare buffers in the sketch instead of using malloc in the classes as I wanted the large buffers to show up in memory usage at the end of compilation, and there's no malloc support for creating DMAMEM buffers.

After separating into classes, refactoring to be more generic (no dependencies on MATRIX_WIDTH/HEIGHT defines) and testing on both 16x32 and 32x32 panels, I found the classes worked with 32x64 with barely any changes. I didn't get a chance to use any code from the >32x32 size pull requests from @ncortot, @GaryBoone, and @mrwastl, though maybe I will when adding support for C-shape chaining.

I have the library in a decent state, and FeatureDemo has been tested on 16x32, 32x32, and 32x64 panels. There's an occasional flicker in the 32x64 I haven't tracked down yet. Only the FeatureDemo example is updated to use the new library, though it should be easy to update the rest. (Except for FastLED_Controller, which will need an update to the FastLED Library with SmartMatrix 3.x support)

It's probably a month or more away from an official release as I'm going to be traveling without a laptop for a couple weeks, and want to add support for more display configurations (especially C-shape paneling), to do more testing, and make some efficiency improvements after I get back.

If there are any other features you'd like to see in the next major release, let's start discussing.

Not yet supported:

mrwastl commented 9 years ago

24-bit or fewer: yes, it is essential because of the limited memory that is available when using a teensy 3.1. 3x8bit per pixel (including foreground-functionality) is rather on the brink when using 96x64. 2x8bit or even 1x8bit (with or without foreground-functionality) is for testing sketches that include other memory consuming libraries (these are compromises with lower performance anyways - but at least 3x8bit should be supported).

maybe the whole colour stuff could be made a little bit more colour-depth agnostic.

c-shape vs. z-shape: the difference is very small: see SmartMatrix::convertToHardwareXY() at MatrixGraphics.cpp at my fork. the question is: do you want to integrate this seemlessly (for the user) or like in my fork (that originates from GaryBoone's work) that uses an extra call to convert real x/y to internal x/y coordinates?

embedded-creations commented 9 years ago

@mrwastl Long term I definitely want to add in support for lower color depth, I'm just on the fence as to if it's a requirement for the initial 3.0 release.

maybe the whole colour stuff could be made a little bit more colour-depth agnostic.

Agreed, but I think the right solution to that is to use C++ templates, which I've never used before, so this could be a big project for me. If you have ideas of an alternate way, please let me know.

Thinking right now... Maybe it could be done using different constructors for the SmartMatrix and Layer classes. You could pass in a buffer with rgb24 or rgb565 type for the Layer class and that would change behavior and obviously the buffer would be smaller. That doesn't seem too bad. The classes would still need code to handle all types of color data, so it wouldn't be the most efficient, and (I think) templates would be an improvement as it would only compile the code needed for the data type chosen by the user, but that's an improvement that could be added later.

the question is: do you want to integrate this seemlessly (for the user)

I was thinking the Layer classes would work on the actual coordinates e.g. 96x64, and the user would have to tell the SmartMatrix class (through a parameter in the constructor or a function call in setup()) to translate the data before shifting it out to the panels. It makes more sense to me that the SmartMatrix class which deals with the hardware should take care of the translation, not the Layer class which should be hardware agnostic. Once they setup the classes, the user shouldn't need to do any translation in their sketch.

mrwastl commented 9 years ago

i just did some tests with my panels (i've ordered some cheap panels from china so now i can easily fool around with different sizes): 'old' big display: 96x64, and a quick and dirty assembled one with 96x32. cpu speed: 144 mhz. 24-bit colour-depth doesn't seem to work at all: feature demo: after some seconds: hang up, some random leds, game over (cpu set to other speeds: no change), reset teensy: same again 36/48-bits: works fine with 96x32 (but it crashes when using height=64). flickering: yes, i seem to have this problem too (or, it is more a 'glitch' than flickering), but only very, very seldom.

a 'wish' for the feature demo: could you make the durations of the scroll texts depending on the display width? because the test is ended before it has finished in many cases (when using width=96).

do you plan to make foreground-layer optional (eg: by using a #define) - to save memory? (in your code i see SMARTMATRIX_SETUP_DEFAULT_LAYERS (allocates back/foreground layers) and SMARTMATRIX_ALLOCATE_FOREGROUND_LAYER, but no 'SMARTMATRIX_SETUP_BACKGROUND_LAYER or so). additionally, the foreground-specific functions could be made unavailable).

a question: memory usage seems to be noticable lower when using the new version of your library (tested with FeatureDemo (old lib vs. new lib, same width/height, same colour-depth)). but i don't understand why (foreground layers are already using optmised size in my fork) - what do i miss?

EDIT1: c++-templates: well, i'm not very familiar with these, but i'm always willing to learn ...

seemless conversion of coordinates: this sounds very good indeed.

EDIT2: tested width width > 3_32: crash (height=32). so it seems that using > 3 panels crashes the library (memory should be no problem according to compile note (eg. compiling for width=32_4: "Global variables use 43,864 bytes (66%) of dynamic memory")

embedded-creations commented 9 years ago

I think I broke 24-bit color as there wasn't a quick hack to make it work after the refactoring . I'll work on that before the release, and maybe getting 24-bit color will open the doors to the other lower color options.

could you make the durations of the scroll texts depending on the display width?

Yes, I noticed that as well on a 64-bit display.

do you plan to make foreground-layer optional (eg: by using a #define) - to save memory?

Yes, I only made SETUP() #defines for the minimum use cases I needed for my examples. We can add more, and you don't need to use the defines, just put the code in your sketch if you have a use case that's not covered. If you want to use just the background layer, I think you can just delete the three lines related to the foreground in SMARTMATRIX_SETUP_DEFAULT_LAYERS(), and I think useDefaultLayers() should work as expected though I didn't intend for useDefaultLayers() to be used with just the background layer. I'll fix it if it's broken.

additionally, the foreground-specific functions could be made unavailable).

I believe the linker should remove them if the sketch doesn't call those function, I could be wrong.

memory usage seems to be noticable lower

That's surprising, I expected the opposite, though not a significant change. I didn't compare memory usage myself.

tested width width > 332: crash (height=32)

OK, I'll test this myself at some point. You might try increasing this #define in hardware.h, and if you have a scope, check out the timing of the data relative to latch. It could be something completely different though. MIN_BLOCK_PERIOD_PER_PIXEL_NS

Try not use asterisks to indicate multiplication in your comments as they are interpreted as italics by the GitHub Markdown parser. e.g. it looks like you said width "332" when you said 3*32. Not a big deal but it keeps confusing me until I realize what happened.

Thanks for testing out the latest code! I'm unfortunately running out of time to work on things before my trip, so I won't be able to make any changes or do any testing for a few weeks, but I made a note of your feedback and test results.

mrwastl commented 9 years ago

arrrgh. i overlooked the markdown-effect on "x * 32". but it's easier if you are testing different widths (and i'm a lazy arse :)

have fun at your travel! i'll try to find the > 3 * 32 problem in the meantime.

mrwastl commented 8 years ago

Some short 'interim report':

when fooling around i've made an observation that i don't understand (maybe someone has an idea):

when i write

const uint8_t height = 32; 
const uint8_t width = (32*6);
const uint8_t depth = 36;
const uint8_t rows = 4;
static DMAMEM uint32_t matrixUpdateData[rows * width * (depth/3 / sizeof(uint32_t)) * 2];
static DMAMEM uint8_t matrixUpdateBlocks[(sizeof(matrixUpdateBlock) * rows * depth/3) + (sizeof(addresspair) * height/2) + (sizeof(timerpair) * depth/3)];

in the global variables definition part and than in setup():

matrix = new SmartMatrix(width, height, depth, rows, matrixUpdateData, matrixUpdateBlocks);
/* some initialisation stuff */
matrix->begin();

initialisation of SmartMatrix only works once (than I've to call another working sketch, eg Louis' FeatureDemo, than i can successfully call my sketch - but only once. and so on - even recompilation and disconnect/reconnect doesn't change that)

but if i write:

const uint8_t height = 32; 
const uint8_t width = (32*6);
const uint8_t depth = 36;
const uint8_t rows = 4;
static DMAMEM uint32_t matrixUpdateData[rows * width * (depth/3 / sizeof(uint32_t)) * 2];
static DMAMEM uint8_t matrixUpdateBlocks[(sizeof(matrixUpdateBlock) * rows * depth/3) + (sizeof(addresspair) * height/2) + (sizeof(timerpair) * depth/3)];
SmartMatrix matrix_inst(width, height, depth, rows, matrixUpdateData, matrixUpdateBlocks);

and in setup():

matrix = &matrix_inst;
/* some initialisation stuff */
matrix->begin();

everything works as expected. race-condition? or s'thing else that i don't see?

embedded-creations commented 8 years ago

@mrwastl Thanks for testing and the report! I'm back from my trip but probably won't have time to get back into code until next week.

36 bit colour seems to be considerably slower than with v2

What do you mean by "slower", and how are you measuring?

initialisation of SmartMatrix only works once

Interesting, I haven't tried using new with Arduino/Teensy. It will be using malloc to allocate memory, and I'm not sure of the side effects of if that's a problem. I'll think about this more when I look at the code later. Why do you want to use new in setup() instead of using matrix_inst as a global variable?

mrwastl commented 8 years ago

it seems that i've found better refreshrate / MIN_BLOCK_PERIOD_PER_PIXEL_NS combinations. the result are much better now. but i will observe this furthermore.

my measuring: my sketch which gets the data via my protocol contains a simple fps-calculation that may be enabled if required. the result (fps) is painted directly into the display content. calculation: ten frames are taken for an FPS-calculation. the time required for these frame updates (swapBuffers) is measured and divided by 10.

testing is done by streaming a video to the teensy (see prev. posting).

here you can have a look at the sketch (this one is for SmartMatrix v2): https://github.com/mrwastl/VSSDCP/blob/master/VSSDCP_base/examples/VSSDCP_SmartMatrix.ino

DEBUG_FPS enables the fps-calculation.

this sketch also shows why i wanted matrix to be an instance reference (because it fits better into the OO-structure of my VSSDCP-protocol stuff. but yes, it isn't really necessary. but more beautiful (well. on the other side: the memory stuff is initialised outside of any class anyways ...)

mrwastl commented 8 years ago

i've just seen that you're enhancing sm3 step by step (waiting for hardware mapping :)

my erraneous assumption that sm3 is slower than sm2 seems to be based on this interesting discovery that i've just made and that i don't understand at all: MIN_BLOCK_PERIOD_PER_PIXEL_NS is set to: (10000/40)

setRefreshRate(X) -> FPS

X FPS
50 26.7 (but unusable, flickering as hell)
60 22.9
65 24.2
68 25.0
69 25.3
70 25.5 (best compromise between flickering and FPS)
71 20.6 (why the drop?)
72 20.8
73 21.0
74 21.2
75 21.4
76 21.6
77 21.7
78 21.9
79 18.4 (drop again)
80 18.6

the measured FPS-values seem plausible to what i 'detect' with my eyes. and the values in question have been measured more than once (to exclude CPU peeks a.s.o.)

maybe you have any idea why there're these drops between 70, 71 and 78,79? i expected the FPS values to be linear or a curve.

EDIT: i forgot the display-size: 96x64

embedded-creations commented 8 years ago

@mrwastl Thanks for the data, that's really useful! I don't have any idea why there would be sudden drops. I'll look into it and let you know.

The actual FPS will be some percentage lower than what you pass into setRefreshRate() because there are some delays added to the refresh code to make sure each bit meets MIN_BLOCK_PERIOD_PER_PIXEL_NS. I think it's around 10-20% off now. That's on the list to fix.

I'm still finishing up all the support for layers, there are still some bugs, then I will move on to supporting the code that refreshes and supports larger displays. Thanks for your patience!

embedded-creations commented 8 years ago

Just a hunch, but those small FPS changes could be straddling the threshold for where another bit of the color depth needs to be padded to meet the minimum MIN_BLOCK_PERIOD_PER_PIXEL_NS. I'll look into it.

embedded-creations commented 8 years ago

SM3 is slower than SM2, I added a countFPS() function to FastLED_Functions and saw 62fps with SM2, 48fps for SM3 with a 32x32 panel. 64x32 only got 5 fps. I traced the biggest slowdown to the code translating from layers to the refresh buffer on each pixel, they're taking 40% longer. I just changed the refresh code to read in a full row of pixels instead of one at a time, and got the frame rate up to 55 fps. I'll continue on this (and check in my code) on Monday.

mrwastl commented 8 years ago

@slower sm3: i'm curious about the improvements on monday :)

@dithering: i'm very curious about what dithering can improve in a true-colour environment? maybe it blurs colour-step effects or so that are created by our visual centre?

@different colour-spaces support: i'm a little bit confused: your class SMLayerBackground uses rgb24-background buffers - no matter what SmartMatrix(w,h,depth, ...) is set to (the background layer is created outside of SmartMatrix()). so i assume this colour depth setting is only relevant for colour-mixing? if i want to save memory i've to touch SMLayerBackground and play around with the background buffer and how to read from / write to it. if i want to gain speed i've to play around with colour mixing (rgb48 vs. rgb24). am i getting it right or do i understand it all wrong?

embedded-creations commented 8 years ago

@dithering - it's unnecessary, Greg realized there was 48-bit color support and is using that now.

@different - the bitmap is 24-bit, but when after color correction, the lowest bits drop out unless using 36 or 48-bit color. Also there's an option to dim the background layer where color will be lost unless using a higher color depth. I want to allow the user to select a color depth for each layer as well as for refreshing. I'll be working on this next week and it should go quick after Greg's help with #27.

mrwastl commented 8 years ago

wow. that really sounds very impressive and promising. can't wait to play around with these additions. i guess that after the addition of hardware-coordinate correction all my 'hacks' will no longer be required (and your rotation-code is already better working than mine :)

i will continue with my serdisplib-protocol stuff. maybe i'll finally get network-based communication working in a useful way. one 'milestone': being able to stream video-output to my rgb matrix-panel via network.

embedded-creations commented 8 years ago

@mrwastl not sure if you know, but there are two other streaming protocols that work with SmartMatrix: TPM2 and Fadecandy. TPM2 is included in our Aurora firmware, and there are details on compatible software on the Aurora wiki. Fadecandy works just like the Fadecandy project and has interpolation and a very efficient USB protocol. You can find two repos for Fadecandy under Pixematix in GitHub. The firmware is a separate repo.

I'm going to be focusing on the Fadecandy USB protocol and OPC for projects in the future. I'm not happy with the currently available software that uses TPM2.

mrwastl commented 8 years ago

@embedded-creations: no, didn't know these. but i think that they have a very different focus than my protocol has / will have: as far as i've seen by having a short look at these protocols they support small pixel amounts and one direction, main focus on intelligent LEDs (be it octo/neopixel-stuff) and rgb matrix led panels. but i'm thinking about streaming the content of my library (serdisplib) to an arbitrary display module/device (via full frame updates but also region updates) AND also including support for GPIs/GPOs (general purpose inputs/outputs like touchscreens, buttons, ...) - controlling a 480x320x24 display that is controlled by a teensy 3.1 is already working using my protocol (but only via USB - and without support for the touchpanel :(

another usage for the protcol: in the (far) future it should completely replace the existing remote-driver found in my library (i'm not happy with this driver).

what this protocol will NOT support: different layers (because my library doesn't support it :) - that's why i only need the background layer when combining this protocol and SmartMatrix.

embedded-creations commented 8 years ago

@mrwastl Yeah, these protocols don't seem suited for a larger display. I did a project using the RFB protocol used in VNC to send updates over USB to a larger display but the project is incomplete. Here are some links in case anything might be useful to you: http://www.embedded-creations.com/projects/microvnc-revisited https://github.com/embedded-creations/Networked-Display

embedded-creations commented 8 years ago

@mrwastl Take a look at the updates that Greg made that I merged into sm3.0, you can now independently select color depths for the layers and refresh. I think the templates can be expanded to include the <24 bit color that you want. Please take a look and let me know what you think. I spent all my time today working on this, and I'll get back to efficiency updates later in the week. At Greg's suggestion I merged his pull request first and I'm glad I did because there were changes to almost every file in the repo, and a merge conflict would be a nightmare.

mrwastl commented 8 years ago

@embedded-creations before the backgroundbrightness-patch there was a big problem with colours when using COLOR_DEPTH 24 (colours seemed to be extremely 'overdriven'). but now everything seems to be fine again (i've testet COLOR_DEPTH 24 and REFRESH_DEPTH 36, cc24 and cc48). refreshrate seems to be much better now, FPS-tests will follow (seems that some side-effect broke my FPS-test/display code - i've adapted your countFPS to display the result on-screen (and not via serial)).

embedded-creations commented 8 years ago

I've yet to commit the efficiency improvements, so expect major FPS increases tomorrow after I do that

embedded-creations commented 8 years ago

OK, first major efficiency improvement is committed, still slower than SmartMatrix 2.x.

The color correction was broken, adding background brightness control didn't in itself fix it, but hid the problem. Now fixed.

I'm thinking about color correction in general and how it relates to different depths for storage in the layer and the color depth of refresh. Right now you can choose several options for ccmode including none, but is on/off good enough? The color correction table could be automatically selected based on the refresh color depth. I can't think of any real use for applying a 12-bit color correction table to a rgb24 image except to demonstrate how much better the 24-bit or 48-bit color correction table looks. Conversely, applying a 24-bit or 48-bit table when refreshing with 12-bit color isn't going to improve the image.

New to this release is support for 48-bit image storage in layers, which @gregfriedland added for his use case of sending already (heavily) gamma/color corrected video to the display. There's no need to apply color correction to rgb48 pixels, and there's no good way to do it anyway as the lookup table would be huge and computing on the fly would be too slow.

I plan on simplifying color correction to either enabled/disabled, and it will be automatically disabled for rgb48 layers. Let me know if this seems ok, I won't start on this until tomorrow at the earliest.

I'm also thinking about how to better handle refresh color depth. Currently 24, 36, and 48-bit color depths are supported, but an rgb48 array is used to collect data from the layers regardless. 24-bit color should use a rgb24 array, and maybe in the future supporting 12-bit or lower color depths with a smaller data type might make sense. Refresh color depth is set at compile time, but there are still comparisons and shifting done as each pixel is loaded into the refresh buffer, which could be eliminated if there were unique loadMatrixBuffers() functions for each supported refresh color depth. I'll look into how this could be done using templates as I think it's a good improvement for both memory and CPU usage.

I'm hoping that the templates Greg used to enable 48-bit color also will work to enable RGB565 and RGB332, please let me know if you take a look.

mrwastl commented 8 years ago

slower maybe (have to do comparable measurements yet), but already very, verrry usable.

one note to FeatureDemo: maybe you could add a define at the beginning (MAX_REFRESH_RATE or so), set the initial rr to this and also maxRefreshRate in DEMO_REFRESH_RATE to this value.

when using a resolution that is too big my teensy crashes (well, teensy and driving 6 panels is always a little bit borderline :)

adding other colour depths: to be honest: i must first understand the new code (the whole template stuff). but then it shouldnt't be that hard i guess. in my sm2-fork 16-bit colour-depth looked quite ok, 8-bit not so good ... but i added this to sm2 to be able to test some examples that needed a lot of space - and the support wasn't speed-efficient at all. i guess when using greg's template-base this should no longer be the case.

colour-correction: unfortunately i can't test it with my 6 panel display (out of mem), but i can test the other combinations. maybe i'll set up a 4 panel display with my other panels. than i should be able to test 48-bit col.depth.

btw: a perfect video (in my opinion) for testing colours and colour correction is 'Kylie Minogue - Put Yourself in My Place' (https://www.youtube.com/watch?v=q9t6wef5xlE) because it contains very 'shiny' colours. it's my favourite testvideo to reveal misconfiguration or colour problems in general (pink vs. blue vs. orange, scenes with fading from dark to glowing white, ...).

embedded-creations commented 8 years ago

For your setup, do you need to set refresh rate before calling matrix.begin()? Maybe I should make the refresh rate part of the constructor instead of hardcoding a default.

If you just want to test something with a different resolution, you can run it on a larger (or smaller) physical setup. I've been testing 32x32 resolution on my 32x64 panel that happens to be plugged in at the moment. The second panel will show something that's a bit off, and it probably gets worse down the chain, but it's not going to break anything.

Packing and unpacking bits and/or shifting data into place with 8-bit and 16-bit color may be inefficient.

I'll give that video a try, thanks!

mrwastl commented 8 years ago

i also thought that this could be a problem but it seems not to be. my usual setup is:

yep, testing with different resolutions works fine (doing that quite often), but it looks crappy if you simulate a 2x2 panel display on a 3x2 one (because the chain-ordering stands in the way). but for quick tests and a little imagination it's ok :) i can also rewire the chaining to get a perfect 2x2 panel output.

(2x edit because of wrong method-name / description)

embedded-creations commented 8 years ago

Maybe I misread your message but is the refresh rate demo in FeatureDemo causing your Teensy to crash? Where do you set refresh rate to change it from the default?

mrwastl commented 8 years ago

sorry, i've written the wrong method-name (already corrected my message)

in FeatureDemo i've inserted a matrix.setRefreshRate(MAX_REFRESH_RATE) after matrix.setColorCorrection(cc48) and changed const int maxRefreshRate = 120; to const int maxRefreshRate = MAX_REFRESH_RATE;

mrwastl commented 8 years ago

addition: if i only set the refreshrate at the beginning and leave const int maxRefreshRate = 120; unchanged, all demos work except the last refreshrate demo - this one crashes the teensy. if i limit maxRefreshRate, even this demo works fine.

embedded-creations commented 8 years ago

Some good progress today. FastLED_Functions is now just as fast in SM3 as it was in SM2 (62fps with 32x32 36-bit color on Teensy 3.1 @ 96MHz).

I replaced ccMode with ccEnabled. colorCorrection() chooses a lookup table based on the size of out.

loadMatrixBuffers() checks for the refresh_depth once per pixel and shifts the data if the refresh_depth is less than 48. I made unique functions for each supported depth, so the check is only done once per row, and that gave a few FPS boost.

With the new loadMatrixBuffers24() function, I can use type rgb24 to get pixels from the layers, and now use the proper 24-bit color correction table.

Duplicating similar code for each refresh_depth isn't the best way to handle this, I'm going to look into using non-type template parameters so the SmartMatrix class can use refresh_depth as a const instead of a variable, as it never changes after compilation.

embedded-creations commented 8 years ago

The background layer uses a lookup table in RAM to do color correction, to enable the background brightness feature. This is unnecessary with a rgb48 layer, or on a memory-constrained system. I should make this feature optional.

gregfriedland commented 8 years ago

Regarding color correction at rgb48, wouldn't this actually be useful to adjust for relative differences in luminance that the eye perceives at low vs high brightness? I don't need it for my current project because I'm sending in color corrected values, but I think this would be useful to people in general. There's not enough RAM for such a big table of course but if we can store it in flash, it would take 64kB out of the 256kB available on the Teensy 3.1 which doesn't seem too bad for most applications. The option to have it available could be a compile time switch so it wouldn't affect people not wanting it, and it could still be switched on/off at runtime. Just a thought.

On Thu, Aug 20, 2015 at 7:33 AM, Louis Beaudoin notifications@github.com wrote:

The background layer uses a lookup table in RAM to do color correction, to enable the background brightness feature. This is unnecessary with a rgb48 layer, or on a memory-constrained system. I should make this feature optional.

— Reply to this email directly or view it on GitHub https://github.com/pixelmatix/SmartMatrix/issues/25#issuecomment-133031229 .

gregfriedland commented 8 years ago

I thought for a second that we could even use the compiler to generate the table with a custom gamma value using template metaprogramming ( http://stackoverflow.com/questions/2226291/is-it-possible-to-create-and-initialize-an-array-of-values-using-template-metapr) but unfortunately the arduino IDE limits template recursion depth to 900 so unless we override this flag, we can't use this approach. Bummer...

On Thu, Aug 20, 2015 at 10:23 AM, Greg Friedland greg.friedland@gmail.com wrote:

Regarding color correction at rgb48, wouldn't this actually be useful to adjust for relative differences in luminance that the eye perceives at low vs high brightness? I don't need it for my current project because I'm sending in color corrected values, but I think this would be useful to people in general. There's not enough RAM for such a big table of course but if we can store it in flash, it would take 64kB out of the 256kB available on the Teensy 3.1 which doesn't seem too bad for most applications. The option to have it available could be a compile time switch so it wouldn't affect people not wanting it, and it could still be switched on/off at runtime. Just a thought.

On Thu, Aug 20, 2015 at 7:33 AM, Louis Beaudoin notifications@github.com wrote:

The background layer uses a lookup table in RAM to do color correction, to enable the background brightness feature. This is unnecessary with a rgb48 layer, or on a memory-constrained system. I should make this feature optional.

— Reply to this email directly or view it on GitHub https://github.com/pixelmatix/SmartMatrix/issues/25#issuecomment-133031229 .

mrwastl commented 8 years ago

WOW. the improvement is amazing. i'm currently driving the display with refreshrate = 95 and FPS = 30.9. this gives a really nice picture without flickering. but: there are similar divergences between refreshrate and FPS as some postings above:

the unexpected FPS-values reach far into the lower refreshrates, but i've set focus to the higher ones:

refreshrate FPS
89 29.8
90 30.0
91 30.2
92 30.3
93 30.5
94 30.7
95 30.9 (best compromise)
96 24.9 (big drop)
97 25.0
98 25.2
99 25.3
100 25.4
101 25.6
102 25.7
103 25.9
104 26.0
105 26.1
106 26.3
107 26.4
108 26.5
109 22.2 (big drop)
110 22.3
111 22.4
112 22.6
115 22.9
118 23.2
119 19.9 (drop)
120 20.0

notes:

FeatureDemo works now with much higher refreshrates: 120 seems to be ok (some ghost lines occasionally), 125 crashes the teensy.

edit: added values for 101++ + FeatureDemo

embedded-creations commented 8 years ago

@gregfriedland 64kB is a big chunk, but you're right, very possible to fit in most sketches. Compile time switches aren't always easy with an Arduino library, it will probably have to be done through using templates, I was thinking about setting flags for options which get passed to the class as a non-type template argument, so the compiler can optimize out code that won't get used. Does this seem reasonable?

embedded-creations commented 8 years ago

@mrwastl I wasn't able to reproduce this FPS stepping with the normal FastLED_Functions, going from refresh 200Hz to zero, same if I extended out MIN_BLOCK_PERIOD_PER_PIXEL_NS, but when I changed the argument of swapBuffers() from false to true, I got the stepping.

swapBuffers(true) copies the bitmap from the new refresh buffer to the new drawing buffer, so the function call will have to wait at least a little bit each time. I believe these big drops are when the time between calls to swapBuffers() crosses a boundary between integer multiples of refresh frames. e.g. from your table above, at 95Hz refresh, your time between swapBuffers calls might take 2.9 refresh cycles but it actually takes 3.0 refresh cycles because you have to wait for the swap to complete. At 96Hz refresh, your time between swapBuffers calls might take 3.1 refresh cycles, but you have to wait for four cycles for the swap to complete.

Hope that's coherent. I could hook up the logic analyzer and toggle some pins to verify, but I'm pretty sure this is the reason.

gregfriedland commented 8 years ago

Yeah, I think that's a good way to go. Then you can add the refresh type and the color correction mode to the template definition if you're so inclined.

On Thu, Aug 20, 2015 at 2:38 PM, Louis Beaudoin notifications@github.com wrote:

@gregfriedland https://github.com/gregfriedland 64kB is a big chunk, but you're right, very possible to fit in most sketches. Compile time switches aren't always easy with an Arduino library, it will probably have to be done through using templates, I was thinking about setting flags for options which get passed to the class as a non-type template argument, so the compiler can optimize out code that won't get used. Does this seem reasonable?

— Reply to this email directly or view it on GitHub https://github.com/pixelmatix/SmartMatrix/issues/25#issuecomment-133186111 .

embedded-creations commented 8 years ago

I added refreshDepth to the SmartMatrix3 template, but it's on a separate branch as I chose to make some major changes to the templates to simplify my coding (if there's an mistake when setting up templates, a ton of confusing errors generated). https://github.com/pixelmatix/SmartMatrix/tree/sm3.0-templatetest

I was getting confused by why all the classes needed to use the template. It comes from Layer using - which isn't really necessary, as the base class itself doesn't use RGB in any way - and the wrapper functions for backwards compatibility with SmartMatrix 2.x, so matrix.someFGorBGmethod() will work. I removed the template from Layer, then removed the wrapper functions from SmartMatrix3, then replaced with for SmartMatrix3. Now I can set latchesPerRow to a const, and the compiler will better optimize loadMatrixBuffers(). The change saves about 3.8k of flash (and 4 bytes of RAM!).

I'll either need two functions for loadMatrixBuffers24() and loadMatrixBuffers48() to deal with the two types, or I can pass a typedef REFRESH_RGB to the template in addition to int refreshDepth - @gregfriedland maybe you know a better way.

Now that I know this template technique works, I'll add another argument to set flags for options.

I'm on the fence about adding back the backwards compatibility wrapper functions. It would be nice to keep existing SmartMatrix users from having to do a bunch of find-replace on their code when they upgrade to SmartMatrix 3.0, but in the long term, it's probably better to separate the Layer classes from SmartMatrix. I'd like to hear your thoughts, and I'll think about this myself over the weekend.

jasoncoon commented 8 years ago

I'm not terribly worried about breaking changes to the API. Even Aurora, the largest SmartMatrix application I know of, shouldn't be difficult to update.

gregfriedland commented 8 years ago

Layer doesn't need to be templated but the way around that is to have two copies of some functions like fillRefreshRow() for the different types, so it's a matter of taste I guess.

Not sure why you need to pass a type for the refresh depth to the template in addition to the refreshDepth number. You can use the macro for creating an rgb24/48 type from the depth number. Maybe I'm misunderstanding though.

It doesn't sound like it will be to difficult to port old code to a new structure, so I'd say go with a clean break if you prefer the new interface you're considering.

On Fri, Aug 21, 2015 at 2:49 PM, Louis Beaudoin notifications@github.com wrote:

I added refreshDepth to the SmartMatrix3 template, but it's on a separate branch as I chose to make some major changes to the templates to simplify my coding (if there's an mistake when setting up templates, a ton of confusing errors generated). https://github.com/pixelmatix/SmartMatrix/tree/sm3.0-templatetest

I was getting confused by why all the classes needed to use the template. It comes from Layer using - which isn't really necessary, as the base class itself doesn't use RGB in any way - and the wrapper functions for backwards compatibility with SmartMatrix 2.x, so matrix.someFGorBGmethod() will work. I removed the template from Layer, then removed the wrapper functions from SmartMatrix3, then replaced with for SmartMatrix3. Now I can set latchesPerRow to a const, and the compiler will better optimize loadMatrixBuffers(). The change saves about 3.8k of flash (and 4 bytes of RAM!).

I'll either need two functions for loadMatrixBuffers24() and loadMatrixBuffers48() to deal with the two types, or I can pass a typedef REFRESH_RGB to the template in addition to int refreshDepth - @gregfriedland https://github.com/gregfriedland maybe you know a better way.

Now that I know this template technique works, I'll add another argument to set flags for options.

I'm on the fence about adding back the backwards compatibility wrapper functions. It would be nice to keep existing SmartMatrix users from having to do a bunch of find-replace on their code when they upgrade to SmartMatrix 3.0, but in the long term, it's probably better to separate the Layer classes from SmartMatrix. I'd like to hear your thoughts, and I'll think about this myself over the weekend.

— Reply to this email directly or view it on GitHub https://github.com/pixelmatix/SmartMatrix/issues/25#issuecomment-133573133 .

embedded-creations commented 8 years ago

@gregfriedland You're right, I didn't realize you could use template arguments with preprocessor macros, it's pretty obvious how to handle this now.

embedded-creations commented 8 years ago

I just pushed some major updates to the SM3.0 branch. I have the API about where I want it, but I'm open to feedback. I'd like to get the API finalized so I can get the library out to some more people in a pre-release (hopefully by this weekend) while I continue to work on updates/features that are either behind the scenes or only add to and don't change the existing API.

Feedback is welcome!

mrwastl commented 8 years ago

@embedded-creations could it be that the new version needs more (dynamically allocated) memory? when configuring for 6 panels (kMatrixWidth = 32 * 6) my teensy crashes (so that i have to press the program mode button - which was hardly the case with your library before). after compilation there are still 2916 bytes left according to arduino-IDE. if i reduce to 5 panels everything seems to work fine. (colour depth is set to 24, kRefreshDepth is set to 36).

embedded-creations commented 8 years ago

@mrwastl Not that I know of, there aren't actually many changes made to the background layer or refresh code since the last version I pushed. Are you using just background, or other layers as well?

mrwastl commented 8 years ago

@embedded-creations my usual initial test: your FeatureDemo.ino with the following changes: initial refreshrate set, kMatrixWidth changed, Serial.begin() commented out.

embedded-creations commented 8 years ago

@mrwastl I can get 5x panels working at 60Hz, not 90, 6x panels at 40Hz. This tells me its not (only) a memory issue, but the refresh routine not being able to keep up. I'll investigate later today.

mrwastl commented 8 years ago

@embedded-creations memory was a quick guess. as far as i remember you're not overclocking your teensy, am i right? i'm using 144mhz, that might explain why i get higher refreshrates. as far as i remember i got 6 panels working with a very low refreshrate, but only for some seconds. i couldn't test with my protocol library (streaming via USB), because it's currently filled with Serial.println()'s :) (fooling around with network-based streaming - you start with 1 Serial.println(), cursing, more println()'s, more cursing, a.s.o.).

embedded-creations commented 8 years ago

@mrwastl I'm out of time for today but I think you're right, it's a memory issue. The Bitmaps example at 32x6 panels wide, 96MHz CPU, 40Hz refresh works with 95% of the time in matrixCalculations(), but only if I lower kDmaBufferRows to 2 from default 4. Drop the refresh to 20Hz, it's spending 51% of the time in matrixCalculations(). Increase kDmaBufferRows from 2 to 3, and it breaks. I'll investigate more tomorrow.

embedded-creations commented 8 years ago

@mrwastl Tried one more thing: I set tempRow0/tempRow1 inside loadMatrixBuffers36() to static, so it's not 1.5kB on the stack. It allowed Bitmaps to work with kDmaBufferRows set to 3 or 4. This is a quick hack you could try, but the real fix might be to pass matrixWidth as a template argument so it can be set as const and the buffers moved off the stack.

    //rgb48 tempRow0[matrixWidth];
    //rgb48 tempRow1[matrixWidth];
    static rgb48 tempRow0[32*6];
    static rgb48 tempRow1[32*6];

Failures from being out of memory and failures from having the refresh rate set too high to keep up look very similar so it's hard to troubleshoot.

embedded-creations commented 8 years ago

I added matrixWidth to the template arguments, and made the tempRow buffers static. @mrwastl see if this works better for you.

I'm trying to figure out how to handle the case where the low priority ISR running matrixCalculations() can't keep up with the refresh rate. I'd like to keep the refresh going after this failure, by either blanking the screen briefly or by automatically lowering the refresh rate. It's not so easy.

mrwastl commented 8 years ago

memory usage is even more (6 panels, kRefreshDepth = 36) -> 99% dyn. memory, leaving 604 bytes (teensy immediately crashes). if i reduce kRefreshDepth to 24 FeatureDemo works with the following settings: refreshrate = 80, LATCH_TO_CLK_DELAY_NS = 800. if i set refreshrate to 90 -> boom.