pimoroni / pimoroni-pico

Libraries and examples to support Pimoroni Pico add-ons in C++ and MicroPython.
https://shop.pimoroni.com/collections/pico
MIT License
1.31k stars 496 forks source link

Dark brightness values don't seem to increase monotonically #880

Closed nlw0 closed 11 months ago

nlw0 commented 11 months ago

I got myself a Pico Unicorn, really great stuff! Excited to start playing around with it.

I liked to read about the gamma correction and the PWM brightness control, because I'm interested in using very dark values. I want to make a clock to leave in the bedroom at night, try to make something that is barely visible in the darkness.

My first surprise was that in the gamma table the first value above 0 is actually 4. I would expect to make the curve linear at the beginning. But never mind that, I guess it's just the result from a simple expression.

If I plot a gradient of values from eg 0 to 15 over the columns, it seems fine, hard to tell if it's not as it should be. But If you change the whole screen into increasing and decreasing brightness levels, you can notice sometimes it's going a little darker while it should be increasing. So it looks like sometimes neighboring values of brightness are flipped, or something like that.

In this branch here I modified the plasma demo to make a "pulsing light", using a triangle wave. Is it just me, or can anybody notice that in the first few steps it doesn't seem to be increasing continuously? Could this be a bug somewhere?...

Notice that I modified the library to also allow for 16 bit input values directly, and it still looks like that.

Regardless of this issue, I would also like to have even darker and intermediate values in this range. Would it be possible to do modifying the library, or is this the limit of the driver?

Pulsating lights MWE: https://github.com/nlw0/pimoroni-pico/tree/nic/darkpulse

Gadgetoid commented 11 months ago

Anything between 0-4 is probably "off" since we're talking counts of single PIO instruction execution times.

That said, I had to combine gamma tables across several products to save flash space so it's probably not optimal.

Everything else is enormously complicated by the fact it's all matrix scanned. The drawbacks of this approach are probably a lot more noticeable at lower brightnesses and it's a very fine balancing act between performance and exposing those drawbacks. One example is charge remaining in the circuit as the rows are scanned through, causing pixels which should be "off" to illuminate- this could potentially be what you're seeing. There's a final empty frame in the output data to try and avoid this, but that doesn't necessarily account for everything. Unfortunately it's more of an art than a science so I can only really guess at what might be happening.

Brightness control happens using BCD - which assigns a power of two delay to each significant bit such that the total on time of the LEDs is the sum of all the delays.

In case this means nothing to you (or onlookers to this issue), it's really difficult to grok how this works since the PIO output "driver" needs a continuous, specially ordered bitstream with the delays interleaved.

Calculating and inserting the delays into the bitstream happens here:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.cpp#L119-L139

Every row of pixels starts with a row offset, which the PIO output driver uses to select the correct row on the display, followed by a packed set of on/off states for each LED in that row, followed by a number of ticks determining how long that state will be persisted for, so it goes:

Gadgetoid commented 11 months ago

Of course while explaining this, I've had a thought. If we assume that low values are effectively off, or so tightly timed as to be marginal, and take this into account with gamma correction for values < 4 then it stands to reason this would also apply to values that use any of those bits- ie: 64 might be effectively identical to 60.

It could be worth modifying bcd_ticks to add a nominal fixed value to give the lower brightness values slightly more on-time.

IE: you might change:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.cpp#L134

uint16_t bcd_ticks = 1 << frame;

to be:

uint16_t bcd_ticks = (1 << frame) + 1;

BCD frame 0 is the least significant bit, so 1 << frame is just a delay of one PIO instruction time.

The problem we get here is that BCD timings are very specifically powers of two so that different combinations of bits cannot result in the same timing value. Intuitively you'd want to just use the raw 14-bit pixel value as the "on time" but that becomes impossible when setting and timing multiple pixels simultaneously (which is why we use BCD, or Binary Coded Decimal).

The above would give the following bit timings:

0 2
1 3
2 5
3 9
4 17
5 33
6 65
7 129
8 257
9 513
10 1025
11 2049
12 4097
13 8193

So 0b11 would have the same on time as 0b100 and 0b1110 (3 + 5 + 9) would have the same on time as 0b10000 (17).

But this means that 0b1111 (2 + 3 + 5 + 9 = 19) would have MORE on time than 0b10000 (17) and - assuming it's even visible at all - would cause exactly the sort of brightness issue you describe.

Which then leads me to think- maybe the whole PIO output clock setup adds an implicit delay to every BCD timing frame which we are not accounting for. Effectively doing what I described above by accident. This shouldn't have a huge impact because the LEDs are only turned on for the actual BCD tick count, or timing period:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.pio#L71C1-L85

But even a potential single instruction delay could have a significant, outsized impact at low brightnesses.

In other words, when we flash a pixel on for bits 0b011 the overhead of flashing it twice means its actual on time is greater, rather than less than 0b100.

So maybe the real answer is:

uint16_t bcd_ticks = (1 << frame) - 1;

To account for the extra instruction of latency we might get from:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.pio#L81C10-L81C10

Gadgetoid commented 11 months ago

Anyway TLDR is between the additional delay in PIO instructions, plus some extra off time delay in the hardware, plus the probable time it takes for LEDs to discharge the output FETs it's very possible there is enough extra on-time to cause the effect you're observing.

Assuming (at a guess) two extra instruction times of on-time per bit, the first part of our gamma table expressed as on-time is:

0, 0, 0, 3, 4, 7, 6, 10, 10, 14, 19, 18, 24, 31, 34, 34, 43, 48, 52, 62, 71, 73, 83, 88, 98, 107, 116, 127, 141, 143, 154, 171, 178

You can see some very minor steps backwards there, (7 down to 6, 19 down to 18) but probably nothing that would be visible in actual use.

This table was generated using this code:

# Amount of extra delay in PIO instructions that occurs per bit
# We know there's at least *one* extra instruction, but depending upon how
# long it takes PIO instructions to propagate changes to GPIO, how long the
# hardware shift-registers we use take to turn "off" and how long LEDs take
# to discharge the activated FETS there may be more time, in theory.
# Two is... a guess.
EXTRA_DELAY = 2

# Gamma table pulled from Pimoroni Pico
GAMMA_14BIT = [
      0, 0, 0, 1, 2, 3, 4, 6, 8, 10, 13, 16, 20, 23, 28, 32,
      37, 42, 48, 54, 61, 67, 75, 82, 90, 99, 108, 117, 127, 137, 148, 159,
      170, 182, 195, 207, 221, 234, 249, 263, 278, 294, 310, 326, 343, 361, 379, 397,
      416, 435, 455, 475, 496, 517, 539, 561, 583, 607, 630, 654, 679, 704, 730, 756,
      783, 810, 838, 866, 894, 924, 953, 983, 1014, 1045, 1077, 1110, 1142, 1176, 1210, 1244,
      1279, 1314, 1350, 1387, 1424, 1461, 1499, 1538, 1577, 1617, 1657, 1698, 1739, 1781, 1823, 1866,
      1910, 1954, 1998, 2044, 2089, 2136, 2182, 2230, 2278, 2326, 2375, 2425, 2475, 2525, 2577, 2629,
      2681, 2734, 2787, 2841, 2896, 2951, 3007, 3063, 3120, 3178, 3236, 3295, 3354, 3414, 3474, 3535,
      3596, 3658, 3721, 3784, 3848, 3913, 3978, 4043, 4110, 4176, 4244, 4312, 4380, 4449, 4519, 4589,
      4660, 4732, 4804, 4876, 4950, 5024, 5098, 5173, 5249, 5325, 5402, 5479, 5557, 5636, 5715, 5795,
      5876, 5957, 6039, 6121, 6204, 6287, 6372, 6456, 6542, 6628, 6714, 6801, 6889, 6978, 7067, 7156,
      7247, 7337, 7429, 7521, 7614, 7707, 7801, 7896, 7991, 8087, 8183, 8281, 8378, 8477, 8576, 8675,
      8775, 8876, 8978, 9080, 9183, 9286, 9390, 9495, 9600, 9706, 9812, 9920, 10027, 10136, 10245, 10355,
      10465, 10576, 10688, 10800, 10913, 11027, 11141, 11256, 11371, 11487, 11604, 11721, 11840, 11958, 12078, 12198,
      12318, 12440, 12562, 12684, 12807, 12931, 13056, 13181, 13307, 13433, 13561, 13688, 13817, 13946, 14076, 14206,
      14337, 14469, 14602, 14735, 14868, 15003, 15138, 15273, 15410, 15547, 15685, 15823, 15962, 16102, 16242, 16383]

# Work out the BCD timings for each bit
timings = [(1 << n) + EXTRA_DELAY for n in range(14)]

# Function to calculate the delay time from a 14-bit colour value
def delay_time(x):
    result = 0
    for bit in range(14):
        mask = 1 << bit
        if x & mask:
            result += timings[bit]
    return result

values = []

# Walk through our gamma table and calculate delay times for every entry
for x in range(256):
    r = delay_time(GAMMA_14BIT[x])
    values.append(r)
    print(x, GAMMA_14BIT[x], r)

# Print out all the timings
print(values)

So yes there is a bug in the driver layer where we assume we'll get the delay times we actually want, but this is mostly occluded by our gamma correction and it's probably not something I'd worry about fixing.

For your use-case you might be able to compensate by adjusting the bcd_ticks values as mentioned above, but since this starts at 1 you don't really have much wiggle room.

Another potential approach would be to clock the PIO faster (this requires overclocking the Pico I believe) and shifting the BCD timings left so you don't have any bits with timings that clash with the overhead.

nlw0 commented 11 months ago

@Gadgetoid thanks so much for all the info! I understand it much better now. I definitely want to investigate this because I really like the topic, and there's a lot I need to learn about this kind of led panels and PWM brightness control.

What exactly do you mean by BCD? Aren't you simply using the binary representation of the values as a sum of powers of 2? As far as I remember, BCD is when you use groups of 4 bits to represent digits from 0-9, like for 7-segment displays and digital clocks.

One thing I'm still trying to understand is how are you able to control a total of 3716=336 leds. Isn't the IC able to do only up to 144? Are there multiple chips in the board?

Regarding what I'm seeing, I suspect it might relate to timing errors, like you said. Maybe the time in one iteration isn't exactly what we expect, something like that. I'll try to play around with this and see what happens!

I was actually thinking one cool project might be to get some photodiodes or whatever, and make a kind of color calibration of the display, with the pico on the loop. Then we can objectively measure the power output, and come up with an accurate correction for the timings or whatever. And then any other necessary gamma on top of that related to brightness perception.

nlw0 commented 11 months ago

Anyway TLDR is between the additional delay in PIO instructions, plus some extra off time delay in the hardware, plus the probable time it takes for LEDs to discharge the output FETs it's very possible there is enough extra on-time to cause the effect you're observing.

OK, I hadn't seen this before. This makes a lot of sense to me, that sequence you computed looks a lot like I observed: it goes up, down then up, then constant and then up again... This is probably it then, there's an extra delay proportional to the number of bits, you think?

For my purposes, I'm happy to be able to play around with a set_rgb that takes uint16 inputs straight to the display (as in my patch), and maybe with the timings. But another possibility might be just computing values for GAMMA_14BIT that minimize the error. At the extreme, this can be done even based on a calibration with a sensor, like I said before.

If this is about the number of bits, there might even be a decrease between 0x01ff and 0x2000 for instance, it would look like 0x2009 or something. Unlikely to happen in the gamma table, though.

The absolute minimum time the LED can be on, though, is it already what the board does? Then I suppose it's also influenced by what's the current used. If I really need to make it darker, I guess I should try maybe dithering the color before drawing.

nlw0 commented 11 months ago

I'll go ahead and close this issue since I believe you've figured it out. If I come up with something interesting related to this, I'll just make a PR. Thanks for the help!

Gadgetoid commented 11 months ago

Okay it's not "BCD" it's "BCM" I mix up Binary-coded Decimal and Binary Code Modulation every single time 😆 😭

The crux is using powers of two on-times for each successive bit, so that we can output multiple chains of bits simultaneously and- like binary- no two bit timings can add up to any other bit timings. At least in theory. There are clearly practical gotchas with this!

I can't even remember what IC we're using here, but IIRC it's just a chain of shift registers or a single register responsible for driving one row of LEDs and probably an inverting buffer or similar to handle connecting the low side (cathode) of each LED to ground.

If you ignore the IC and imagine the Pico has 48 spare pins to drive the RGB elements of each led, plus another 7 pins to handle each row then all we're doing is turning on one of the 7 row pins (to connect to ground) and a mix of the 48 RGB channel pins depending on whether a specific LED needs to be illuminated for a specific BCM phase.

We shift out all the on/off bits for a single BCM phase of a single row, flash that row on for the required time, and then move to the next phase.

Once all 14 phases are done for a single row we advance to the next row, and the next, so in practice only one row of LEDs is flashed on at a time.

But the TLDR answer is- we're not limited by any specific matrix driver because we're not using one.

ali1234 commented 11 months ago

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.cpp#L17-L29

So if I understand this right it works like:

data:                                       action:

64 bits on/off data                         shift out to columns register
8 bits padding                              ignore
8 bits row select                           shift out to row register
16 bits tick count                          wait this many cycles

The above is referred to as a frame by the source code, but a phase by your description above. 14 frames are sent for each row, then a dummy frame is sent which turns everything off (no row is selected).

The tick counts are set up once at construction. For frames 0 to 13 the tick count is 1 << frame:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.cpp#L134-L134

For frame 14 the tick count is 0xffff:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.cpp#L123

14 frames are sent because the gamma table contains values up to 14 bits long.

So if I understand all that correctly, then your extra cycle of latency comes from here:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.pio#L77-L78

And the reason is described on the comment here:

https://github.com/pimoroni/pimoroni-pico/blob/b4451c3bdc06235a1358a5a8aabd008647ed9f8a/libraries/pico_unicorn/pico_unicorn.pio#L23-L24

This means your 1 cycle frame waits 2 cycles. Your 2 cycle frame waits 3 cycles. Your 4 cycle frame waits 5 cycles, etc. This has a worse effect on the shorter cycles. You can send (1 << frame) - 1 to fix this.

The delay loop is in addition to however many cycles the rest of the PIO code spends shifting out the data. In practice the 1 cycle frame is going to be 10s of cycles long and the 2 cycle frame, which should be twice as long, will actually be only 1 cycle longer, and the 4 cycle frame will only be 3 cycles longer. This would result in eg 1, 2, and 4 being roughly the same brightness, with 3 being noticeably brighter because it turns on for two frame, incurring two sets of main-loop overhead.

So to fix this, I think you need to make the bcd_count loop in the PIO take approximately as long as the rest of the PIO code (eg using side-delay). Then send (1 << frame) - 1 as the count, to negate the overhead of running the main loop once per phase. And then possibly increase the PIO clock so that it still updates at a reasonable speed.

Except there is a problem with this: you already need to send (1 << frame) - 1 because the loop test is post-decrement. For frame 0 you need to run the delay loop zero times, which is impossible with a post-decrement loop! If you send (1 << frame) - 2 then the first frame runs for 65535 cycles!

So in fact to keep all the frames the correct length you need to send: (1 << (frame+1)) - 2

ali1234 commented 11 months ago

Okay I edited that comment a lot.

tl;dr your frames/phases are supposed to double in duration each time, but they don't for two reasons: you forgot that loop tests are post decrement and you didn't account for time spent outside the delay loop when the LEDs will be turned on. Both of these extend the length of every frame by a constant time. This means each frame is less than double the duration of the previous, which causes eg brightness levels 1, 2, 4 to look similar while 3 is brighter because it is turned on for two frames and therefore experiences double the overhead.

nlw0 commented 11 months ago

Ok so perhaps we could try something like this?

uint16_t framedelay = 1; //maybe 2 or 3
uint16_t frameperiod = 1<<frame;
uint16_t bcd_ticks = (frameperiod < framedelay) ? 0 : frameperiod - framedelay;

If framedelay is 1, the resulting time would be just right. If it's >1, then maybe we don't really have control over it with this bit-depth, at least not with this specific technique.