raspberrypi / pico-feedback

25 stars 2 forks source link

WS2812 Diver code run through in raspberry-pi-pico-c-sdk doc. #121

Open Bleep42 opened 3 years ago

Bleep42 commented 3 years ago

In raspberry-pi-pico-c-sdk doc, Section 3.2.2 Pg 37 on. The Times specified in the code and on page 37, are T1=2, T2=5, T3=3 however the timing diagrams further down Fig 4 to 7 imply timings of T1=4 T2=3 T3=3 assuming I am reading them correctly. Also Fig 6 mentions T1 twice, I think the second mention should be T2. Assuming I understand how the code works, I believe the actual timing values should be T1=2, T2=2, T3=3 which I believe should give 358ns for a '0' bit, 716ns for a '1' bit and a minimum inter bit spacing of 537ns. My understanding is that the '1' bit should be roughly twice the length of the '0' where as in the example it is over 3 times? (2clk + 5clk) for a '1' or just 2clk for a'0' Though I understand that the original timings will work fine, (they do) it doesn't help to make how the code works very clear, especially when looking at the timing specifications for the WS2812, I also understand that the specification timings can vary quite a bit depending on where you look, or 2,4,2 potentially if a WS2812B.

psitech commented 3 years ago

After scrutinizing the raspberry-pi-pico-c-sdk document and measuring with a logic analyzer, it boils down to the following:

So 1 bit time of the LED equals 10 clocks of the statemachine. These 10 SM clocks are used for generating either a low or high LED bit. An SM clock is indicated as a vertical dotted line in the image below.

WS2812 timing

As you might have guessed, (T1 + T2 + T3) must be 10. From the WS2812B (link) datasheet: WS2812 timing 2 WS2812 timing 3

With the original T1=2, T2=5, T3=3:

Since the "0 code" timing is just on the edge of the specification, I changed T1 to 3 and T2 to 4. so now a "0 code" is 0.375 usec high/0.875 usec low. Now the timing is within spec. I verified this to be correct using a logic analyzer running at a resolution of 2.5 nsec:

DSView-210223-115617

Regards, Paul

Bleep42 commented 3 years ago

Hi Paul, Firstly, my main comment is that the diagrams Fig 4 to 7, one of which you reproduce do not reflect the values used in the code, which to me is very confusing. In the code T1=2, T2=5, T3=3 however the timing diagrams further down Fig 4 to 7 imply timings of T1=4 T2=3 T3=3, therefore the diagrams need to be corrected, to be the same as the code.

You imply that the total of T1+T2+T3 has to be 10 "As you might have guessed, (T1 + T2 + T3) must be 10." as far as I am aware this is not the case and I have used numerous values for T1,T2,T3 which do not add up to 10 and everything works perfectly; there is the possibility that the exact 800kHz won't be met, depending on what clocks/dividers are abe to be used, but it will usually be close enough for all practical purposes, for example I am using T1=2, T2=2, T3=3, which gives a 178nS split, giving 358ns for a '0' bit, 716ns for a '1' bit and a minimum inter bit spacing of 535ns. This works fine. However the values used in the code of T1=2, T2=5, T3=3 are I suspect actually intended for the WS2812B, which has different timing requirements, so something needs correcting in the document and or the code. Regards, Kevin.

psitech commented 3 years ago

Hi Kevin, Agree, the graphs got me confused too initially. And, as you suggest, a documentation update is indeed needed.

Interesting to hear from you that not-adding-up-to-10 for T1+T2+T3 is also working! I got to the must-add-up-to-10 conclusion by looking at the Python code, which sets the SM clock to 8MHz: _sm = rp2.StateMachine(0, ws2812, freq=8_000_000, sideset_base=Pin(PINNUM))

Anyway, later today I will play with different Tx values [not necessarily adding up to 10] and see what the output shows on the logic analyzer. My guess is that these LEDs are somewhat forgiving with respect to timing. Actually, the header Data transfer time( TH+TL=1.25µs±600ns) above the timing table is questionable. It sounds as if the bit time can be anywhere between 0.65 usec and 1.85 usec? This violates the subsequent values and tolerances...

Regards, Paul

lurch commented 3 years ago

However the values used in the code of T1=2, T2=5, T3=3 are I suspect actually intended for the WS2812B, which has different timing requirements, so something needs correcting in the document and or the code.

Oh!! I can't speak for the author of the code you're talking about here, but personally I didn't realise that WS2812 and WS2812B were actually different, nor that they had different timing requirements. Confusingly, https://github.com/raspberrypi/pico-micropython-examples/tree/master/pio/neopixel_ring talks about both "NeoPixel" and "WS2812", whereas the example projects at the bottom of https://www.raspberrypi.org/documentation/pico/getting-started/ reference "NeoPixel" and "WS2812b" :confused:

EDIT: And https://datasheets.raspberrypi.org/pico/raspberry-pi-pico-c-sdk.pdf also says "The WS2812 LED (sometimes sold as NeoPixel) is an addressable RGB LED." Although https://learn.adafruit.com/adafruit-neopixel-uberguide/the-magic-of-neopixels doesn't really clarify whether "NeoPixel" is WS2812-only, WS2812B-only or both WS2812 and WS2812B? .... ah, https://learn.adafruit.com/adafruit-neopixel-uberguide/downloads implies that "NeoPixel" covers all of WS2812, WS2812B and SK6812. The particular datasheets linked from there say that (converting everything to microseconds for simplicity) :

WS2812 WS2812B SK6812
T0H 0.35us +/- 0.15us 0.4us +/- 0.15us 0.3us +/- 0.15us
T0L 0.8us +/- 0.15us 0.85us +/- 0.15us 0.9us +/- 0.15us
T0 total 1.15us +/- 0.3us 1.25us +/- 0.3us 1.2us +/- 0.3us
T1H 0.7us +/- 0.15us 0.8us +/- 0.15us 0.6us +/- 0.15us
T1L 0.6us +/- 0.15us 0.45us +/- 0.15us 0.6us +/- 0.15us
T1 total 1.3us +/- 0.3us 1.25us +/- 0.3us 1.2us +/- 0.3us
RES Above 50us Above 50us 80us
psitech commented 3 years ago

Hi Andrew, thanks for compiling the table. As you can see the tolerances are quite wide so one set of T1-T2-T3 should work on all of them. Even the older WS2811 should work with that setting. More on the T1+T2+T3 != 10 later today.

Regards, Paul

psitech commented 3 years ago

Allright, have been playing with different combinations of T1-T2-T3. For your information: I'm running this Python code: NeoPixelRing_Rainbow.zip and using the Adafruit 24-LED neopixel ring (link).

TL;DR if you want 800kHz bit rate, T1+T2+T3 has to be 10 [with state machine clock of 8MHz]. TL;DR these LEDs are quite forgiving on the timing.

My conclusion is that 3-4-3 is the most safe T1-T2-T3 setting specification-wise for these Neopixels.

Regards, Paul

Here are the logic analyzer outputs for different combinations of T1-T2-T3:

1-4-1 [ring did not display] 141

1-5-1 [ring did not display] 151

1-6-1 [ring displayed running rainbow] 161

1-8-1 [ring displayed running rainbow] 181

2-2-2 [ring did not display] 222

2-3-2 [ring did not display] 232

2-4-2[ring displayed running rainbow] 242

2-6-2 [ring displayed running rainbow] 262

3-4-3 [ring displayed running rainbow] 343

4-2-4 [ring displayed running rainbow] 424

1-10-1 [ring displayed running rainbow] 1101

1-12-1 [ring displayed running rainbow] 1121

1-14-1 [ring displayed running rainbow] 1141

1-16-1 [ring displayed running rainbow] 1161

1-17-1 [did not compile, error PIOASMError: delay too large]

lurch commented 3 years ago

Thanks @psitech

So combining your 3-4-3 timings with my table above (and calculating PIO's value's as T0H = T1, T0L = T2 + T3, T1H = T1 + T2, T1L = T3), and a PIO frequency of 8MHz, I guess that gives us:

WS2812 WS2812B SK6812 3-4-3 timing original 2-5-3 timing 3-3-4 timing
T0H 0.35us +/- 0.15us 0.4us +/- 0.15us 0.3us +/- 0.15us 0.375us 0.25us 0.375us
T0L 0.8us +/- 0.15us 0.85us +/- 0.15us 0.9us +/- 0.15us 0.875us 1.0us 0.875us
T1H 0.7us +/- 0.15us 0.8us +/- 0.15us 0.6us +/- 0.15us 0.875us 0.875us 0.750us
T1L 0.6us +/- 0.15us 0.45us +/- 0.15us 0.6us +/- 0.15us 0.375us 0.375us 0.500us

(Assuming I've got my maths correct, of course! :laughing: )

EDIT: Added @psitech 's 3-3-4 timing from his comment below

psitech commented 3 years ago

Hi Andrew, your extended table is correct.

psitech commented 3 years ago

If I'm correct, a 3-3-4 timing fits all 3 LEDs: T0H = 0.375us T0L = 0.875us T1H = 0.750us T1L = 0.500us

lurch commented 3 years ago

@ladyada @ptorrone @tannewt You probably have access to more different types of NeoPixel than anybody else :grin: Would you mind trying the T1=3 T2=3 T3=4 timings suggested above (for https://github.com/raspberrypi/pico-examples/tree/master/pio/ws2812 ) on a selection of different types, to see if they all work reliably?

psitech commented 3 years ago

For completeness, here is the logic analyzer output for the 3-3-4 timing:

334 334-2

tannewt commented 3 years ago

@ladyada @ptorrone @tannewt You probably have access to more different types of NeoPixel than anybody else grin Would you mind trying the T1=3 T2=3 T3=4 timings suggested above (for https://github.com/raspberrypi/pico-examples/tree/master/pio/ws2812 ) on a selection of different types, to see if they all work reliably?

So I hit a similar question when doing Neopixel for CircuitPython. All of the datasheets have different values. I asked @paintyourdragon, who is our resident neopixel expert, and he suggested to simply think of it as a 800khz signal with 1/3 duty cycle for zero and 2/3 for one.

I implemented it in circuitpython with:

bitloop:
  out x 1        side 0 [1]; Side-set still takes place before instruction stalls
  jmp !x do_zero side 1 [1]; Branch on the bit we shifted out after 1/3 duty delay. Positive pulse
do_one:
  jmp  bitloop   side 1 [1]; Continue driving high, for a long pulse
do_zero:
  nop            side 0 [1]; Or drive low, for a short pulse

So its 2-2-2 timing and we clock the state machine at 800khz * 6.

I haven't heard any complaints about our implementation's timing so I assume it's pretty good. We've had two bugs but neither were timing related. One was that we deinit too early and therefore didn't transmit all the bits and the other was due to us floating the pin while not transmitting.

psitech commented 3 years ago

Hi Scott, I implemented your 1/3 & 2/3 duty cycle in MicroPython and that worked fine on the Neopixel ring as well. Had to change the statemachine clock to 4.8MHz. Here is the code:

@rp2.asm_pio(sideset_init=rp2.PIO.OUT_LOW, out_shiftdir=rp2.PIO.SHIFT_LEFT, autopull=True, pull_thresh=24)
def ws2812():
    T1 = 2
    T2 = 2
    T3 = 2
    wrap_target()
    label("bitloop")
    out(x, 1)               .side(0)    [T3 - 1]
    jmp(not_x, "do_zero")   .side(1)    [T1 - 1]
    jmp("bitloop")          .side(1)    [T2 - 1]
    label("do_zero")
    nop()                   .side(0)    [T2 - 1]
    wrap()

# Create the StateMachine with the ws2812 program, outputting on pin
sm = rp2.StateMachine(0, ws2812, freq=4_800_000, sideset_base=Pin(PIN_NUM))

# Start the StateMachine, it will wait for data on its FIFO.
sm.active(1)

Here is the logic analyzer output: 222at4 8Mhz 222at4 8Mhz-2

One remark: this 2-2-2 timing theoretically violates the T1H & T1L spec of the SK6812...

Regards, Paul

tannewt commented 3 years ago

One remark: this 2-2-2 timing theoretically violates the T1H & T1L spec of the SK6812...

I believe most modern NeoPixels we sell are the SK6812 so I wouldn't worry about it. Many products say they may be WS2812B or SK6812 and used interchangeably. I bet SK6812 are one of the best tested.

lurch commented 3 years ago

One remark: this 2-2-2 timing theoretically violates the T1H & T1L spec of the SK6812...

I'm not entirely convinced by the timing values in the SK6812 datasheet, since the maximum T0H (0.3us + 0.15us) is the same as the minimum T1H (0.6us - 0.15us), and the minimum T0L (0.9us - 0.15us) is the same as the maximum T1L (0.6us + 0.15us) ? :thinking: :confused:

psitech commented 3 years ago

Yeah, the specs of all 3 LEDs are ambiguous. I believe the 1/3-2/3 rule is safe to go for. Next question is which actual settings we aim for? 2-2-2 @ 4.8MHz, 1-1-1 @ 2.4MHz, 3-3-3 @ 7.2MHz or 4-4-4 @ 9.6MHz? All these 4 settings compile in MicroPython and look identical on the logic analyzer and work on the Neopixel ring. I can't fully oversee the impact of a lower/higher state machine clock though. My gut feeling says to go for the lowest clock speed that accomplishes the job, in this case 2.4MHz. Perhaps one of the RP2040 chip designers can chime in?

lurch commented 3 years ago

I'm not one of the chip designers, but personally I'd be tempted to go with the 2-2-2 @ 4.8MHz that @tannewt has confirmed works well for CircuitPython. Hard to argue against real-world testing! :wink:

lurch commented 3 years ago

Ping also @benevpi since I believe he's also done quite a bit of stuff with NeoPixels on the Raspberry Pi Pico.

psitech commented 3 years ago

I'm fine with the 2-2-2 @ 4.8MHz timing.

benevpi commented 3 years ago

I've only just seen this as it seems my spam filter's decided it doesn't like Github. I'll dig out my big pile of test WS2812B-alikes and test them out with the 2-2-2 timings.

aallan commented 3 years ago

@Wren6991 can you take a look? We've gone off into the weeds here, but the thread started with a specific question about Figure 4 - 7 in the SDK book.

henrygab commented 3 years ago

First, I am impressed with the analysis and learnings shown in this thread.

As you may know, FastLED is a well-known, heavily-tested and optimized library for controlling these LEDs ... some would say they are the "gold standard". Thus, for real-world use, I would point to the FastLED WS2812 timings as the timings I would recommend for use in examples.

As to the WS2812 vs WS2812B, they have the same timing.

psitech commented 3 years ago

Right, FastLED is probably where the initial Pico SDK "2-5-3" timing originated from:

// WS2812 - 250ns, 625ns, 375ns template <uint8_t DATA_PIN, EOrder RGB_ORDER = RGB> class WS2812Controller800Khz : public ClocklessController<DATA_PIN, C_NS(250), C_NS(625), C_NS(375), RGB_ORDER> {};

And although that timing seems to work in general, it's formally violating the LED specs. But then again, the LED specs are also ambiguous. I do have a preference for the 1/3-2/3 timing, since you can achieve that timing with smallest PIO code:

@rp2.asm_pio(sideset_init=rp2.PIO.OUT_LOW, out_shiftdir=rp2.PIO.SHIFT_LEFT, autopull=True, pull_thresh=24)
def ws2812():
    wrap_target()
    label("bitloop")
    out(x, 1)               .side(0)
    jmp(not_x, "do_zero")   .side(1)
    jmp("bitloop")          .side(1)
    label("do_zero")
    nop()                   .side(0)
    wrap()

sm = rp2.StateMachine(0, ws2812, freq=2_400_000, sideset_base=Pin(PIN_NUM))

But the final decision is not up to me :-)

aallan commented 3 years ago

I'm somewhat unclear as to what the conclusions from this thread actually were? Is there still a documentation issue?

Bleep42 commented 3 years ago

Nope. nothing has changed. As I said in the very first post. "In raspberry-pi-pico-c-sdk doc, Section 3.2.2 Pg 37 onwards The times specified in the code and on page 37, are T1=2, T2=5, T3=3 however the timing diagrams further down Fig 4 to 7 imply timings of T1=4 T2=3 T3=3 assuming I am reading them correctly. Also Fig 6 mentions T1 twice, I think the second mention should be T2." Personally I think the timing diagrams are making things confusing, by being different to the code. The remainder of the comment is open to debate. Regards, Kevin.

psitech commented 3 years ago

Yes, Kevin is right. Figure 3, page 35, [C SDK, rev 1.4.1] does not match the actual timing in the example code on page 36. Then the thread went into the details of what the timing should be, mea culpa. So the documentation is not 100% correct.

Regards, Paul

Bleep42 commented 3 years ago

Hi, Actually the Fig. 3 page 36 is basically ok, it's just an approximate, this is the shape of the waveform, type diagram. However Fig. 4 to 7, Page 38 to 40 inclusive all show fairly specific timing information, T1=4, T2=3, T3=3. This does not agree with the code, which specifies that T1=2, T2=5, T3=3.

10 .define public T1 2
11 .define public T2 5
12 .define public T3 3

Regards, Kevin.

lurch commented 3 years ago

Also, the caption for Figure 4. talks about T1 and T2, whereas the diagram itself shows T3 and T1.

Bleep42 commented 3 years ago

Hi Andrew, You are quite right, I hadn't noticed that. Should probably be something similar to this "The statemachine drives the line low for time T3 as it shifts out one data bit from the OSR, and then high for time T1 whilst branching on the value of the bit."? Similarly as I have already pointed out in Fig. 6 T1 is mentioned twice, the second mention should infact be T2. Regards, Kevin.

lurch commented 3 years ago

I'd need to read and understand the whole section to determine whether it was the caption that was wrong or the image that was wrong :wink: And unfortunately I'm far too busy with other stuff to have time for that.

Bleep42 commented 3 years ago

I have, it looks correct to me. ;-) Regards, Kevin.