NS108/16bit support feasible?

ccoenen commented 3 years ago

One of the more interesting RGB LED chips - to me at least - the NS108 / HD108 style LED, which has support for 16 bit per pixel and has a protocol somewhat similar to APA102-style LEDs. Would it be possible to support those, too?

I have a very simple piece of demo code that just pushes the correct few bits out to the LEDs, but in the end it's just a bunch of shiftOut calls.

Here's the datasheet: https://addressableledstrip.com/uploads/20200605/3cc6b3c4b37d1544ed0c4a320947d9d5.pdf Page 7 is where the protocol is described.

simap commented 3 years ago

Yes! I have some NS108 on hand, and some WS2816s as well which are likewise 16-bits per element. I hope to support these new chipsets soon.

jonct commented 2 years ago

Following up from the forum: here's my first quick stab at code until hardware arrives. If you set up a repository near the others I'll keep at it there. Much appreciated!

//
// Created by Ben Hencke on 2/10/17.
// Adapted by Jon C. Thomason on 3/16/22.
//

#ifndef NS108ADAPTER_HPP
#define NS108ADAPTER_HPP

#include <Arduino.h>
#include <SPI.h>
#include <functional>

#ifdef ESP32
#include "soc/spi_struct.h"
constexpr spi_dev_t * spiDev = (volatile spi_dev_t *)(DR_REG_SPI3_BASE);
#endif

// LED component order
enum { rComponent, gComponent, bComponent };
enum outputOrder {
    NS108_RGB = ((rComponent << 0) | (gComponent << 2) | (bComponent << 4)),
    NS108_RBG = ((rComponent << 0) | (bComponent << 2) | (gComponent << 4)),
    NS108_GRB = ((gComponent << 0) | (rComponent << 2) | (bComponent << 4)),
    NS108_GBR = ((gComponent << 0) | (bComponent << 2) | (rComponent << 4)),
    NS108_BRG = ((bComponent << 0) | (rComponent << 2) | (gComponent << 4)),
    NS108_BGR = ((bComponent << 0) | (gComponent << 2) | (rComponent << 4)),
};

typedef struct {
    uint8_t R;
    uint8_t G;
    uint8_t B;
    uint8_t V;
} source_pixel;

typedef union {
    source_pixel s;
    uint32_t frame;
    uint8_t b[4];
} source_pixel_t;

typedef struct {
    uint16_t startBit  : 1;
    uint16_t gainR     : 5;
    uint16_t gainG     : 5;
    uint16_t gainB     : 5;
    uint16_t component0;
    uint16_t component1;
    uint16_t component2;
} output_frame;

typedef union {
    output_frame s;
    uint64_t frame;
    uint8_t b[8];
} output_frame_t;

typedef std::function<void(uint16_t index, uint8_t rgbv[])> ApaPixelFunction;

class NS108Adapter {
public:
    NS108Adapter(outputOrder order = NS108_BGR) {
        setColorOrder(order);
    }

    ~NS108Adapter() {
        end();
    }

    void begin(uint32_t spiFrequency = 2000000L) {
        SPI.begin();
        SPI.setFrequency(spiFrequency);
        SPI.setBitOrder(MSBFIRST);
        SPI.setDataMode(SPI_MODE0);

#ifdef ESP8266
        //borrowed from SPI.cpp, set registers for a 64bit transfer buffer
        uint16_t bits = 64;
        const uint32_t mask = ~((SPIMMOSI << SPILMOSI) | (SPIMMISO << SPILMISO));
        bits--;
        SPI1U1 = ((SPI1U1 & mask) | ((bits << SPILMOSI) | (bits << SPILMISO)));
#endif

#ifdef ESP32
        spiDev->user.usr_miso = 0; //disable input
        spiDev->user.doutdin = 0; //half duplex

        //config for 64 bit xfers
        spiDev->mosi_dlen.usr_mosi_dbitlen = 63;
        spiDev->miso_dlen.usr_miso_dbitlen = 63;
#endif
    }

    void end() {
        SPI.end();
    }

    void setSpiFrequency(uint32_t spiFrequency) {
        SPI.setFrequency(spiFrequency);
    }

    void setColorOrder(outputOrder order) {
        componentOffset[0] = (((order >> 0) & 3) + 1);
        componentOffset[1] = (((order >> 2) & 3) + 1);
        componentOffset[2] = (((order >> 4) & 3) + 1);
    }

    void show(uint16_t numPixels, ApaPixelFunction cb) {
        int curPixel;
        source_pixel_t pixel;
        output_frame_t output;
        uint8_t gain;

        //start frame (128 bits low)
        write64(0);
        write64(0);

        //pixels, sourced from callback
        for (curPixel = 0; curPixel < numPixels; curPixel++) {
            pixel.frame = 0x1f000000; //default to brightest black
            cb(curPixel, pixel.b);
            gain = pixel.s.V & 0x1f;

            //scale each 8-bit component to 16-bit by stuttering
            output.s = {1, gain, gain, gain,
                (pixel.b[componentOffset[0]] * 0x101),
                (pixel.b[componentOffset[1]] * 0x101),
                (pixel.b[componentOffset[2]] * 0x101),
            };

            write64(output.frame);
        }

        //end frame (at least one additional bit per LED, high)
        for (uint8_t drain = (numPixels >> 6) + 1; drain > 0; drain--) {
            write64(0xffffffffffffffffffffffff);
        }
    }

private:
    inline void write64(uint64_t v) {
#ifdef ESP8266
        while(SPI1CMD & SPIBUSY) {}
        SPI1W0 = v;
        SPI1CMD |= SPIBUSY;
#endif
#ifdef ESP32

        //as usual, default transfer blocks for sending, and has a lot of redundancies
//        SPI.transfer32(v);

        while(spiDev->cmd.usr);
        spiDev->data_buf[0] = v;
        spiDev->cmd.usr = 1;
        //don't do this since I turned off MISO and full duplex
        //data = spi->dev->data_buf[0];
#endif

    }
    uint8_t componentOffset[3];
};

#endif //NS108ADAPTER_HPP

jonct commented 2 years ago

Sorry for the delay; I wanted to get my kid involved in the breadboarding.

This is working for me on a SparkFun ESP8266 Thing Dev, with a 5m 150pc strip.

Since SPI1W0 was uint32_t anyway, I went back to your write32 exactly as-is. So I don't expect any surprises on ESP32.

Either way, I haven't begun any formal measured optimizations.

But I do now have me a case of the blinkies… 🤩

//
// Created by Ben Hencke on 2/10/17.
// Modified by Jon C. Thomason on 3/19/22.
//

#ifndef NS108ADAPTER_HPP
#define NS108ADAPTER_HPP

#include <Arduino.h>
#include <SPI.h>
#include <functional>

#ifdef ESP32
#include "soc/spi_struct.h"
constexpr spi_dev_t * spiDev = (volatile spi_dev_t *)(DR_REG_SPI3_BASE);
#endif

// LED component order
enum { rComponent, gComponent, bComponent };
enum outputOrder {
    NS108_RGB = ((rComponent << 0) | (gComponent << 2) | (bComponent << 4)),
    NS108_RBG = ((rComponent << 0) | (bComponent << 2) | (gComponent << 4)),
    NS108_GRB = ((gComponent << 0) | (rComponent << 2) | (bComponent << 4)),
    NS108_GBR = ((gComponent << 0) | (bComponent << 2) | (rComponent << 4)),
    NS108_BRG = ((bComponent << 0) | (rComponent << 2) | (gComponent << 4)),
    NS108_BGR = ((bComponent << 0) | (gComponent << 2) | (rComponent << 4)),
};

typedef std::function<void(uint16_t index, uint8_t rgbv[])> ApaPixelFunction;

class NS108Adapter {
public:
    NS108Adapter(outputOrder order = NS108_RGB) {
        setColorOrder(order);
    }

    ~NS108Adapter() {
        end();
    }

    void begin(uint32_t spiFrequency = 16000000L) {
        SPI.begin();
        SPI.setFrequency(spiFrequency);
        SPI.setBitOrder(MSBFIRST);
        SPI.setDataMode(SPI_MODE0);

#ifdef ESP8266
        //borrowed from SPI.cpp, set registers for a 32bit transfer buffer
        uint16_t bits = 32;
        const uint32_t mask = ~((SPIMMOSI << SPILMOSI) | (SPIMMISO << SPILMISO));
        bits--;
        SPI1U1 = ((SPI1U1 & mask) | ((bits << SPILMOSI) | (bits << SPILMISO)));
#endif

#ifdef ESP32
        spiDev->user.usr_miso = 0; //disable input
        spiDev->user.doutdin = 0; //half duplex

        //config for 32 bit xfers
        spiDev->mosi_dlen.usr_mosi_dbitlen = 31;
        spiDev->miso_dlen.usr_miso_dbitlen = 31;
#endif
    }

    void end() {
        SPI.end();
    }

    void setSpiFrequency(uint32_t spiFrequency) {
        SPI.setFrequency(spiFrequency);
    }

    void setColorOrder(outputOrder order) {
        //first, second, third output component offsets from within source pixel
        //e.g. NS108_RBG -> { rComponent=0, bComponent=2, gComponent=1 }
        srcComponentOffset[0] = ((order >> 0) & 3);
        srcComponentOffset[1] = ((order >> 2) & 3);
        srcComponentOffset[2] = ((order >> 4) & 3);
    }

    void show(uint16_t numPixels, ApaPixelFunction cb) {
        int pixelIndex;
        uint8_t gain;

        // 32-bit source pixel
        union {
            uint32_t frame;
            struct {
                uint8_t R;
                uint8_t G;
                uint8_t B;
                uint8_t V;
            } s;
            uint8_t b[4];
        } srcPixel;

        // 64-bit destination pixel
        union {
            uint64_t frame;
            uint32_t chunk[2];
            uint16_t w[4];
            uint8_t b[8];
        } outPixel;

        //start frame (64 bits low)
        write32(0);
        write32(0);

        //pixel sequence
        for (pixelIndex = 0; pixelIndex < numPixels; pixelIndex++) {

            // solicit from callback
            srcPixel.s = {0, 0, 0, 31}; //default to brightest black
            cb(pixelIndex, srcPixel.b);
            gain = srcPixel.s.V & 0x1f;

            //start bit, then three 5-bit component gain values, MSB-first
            uint8_t *p = &outPixel.b[0];
            *p++ = 0x80 | (gain << 2) | (gain >> 3);
            *p++ = (gain << 5) | gain;

            //expand each 8-bit component to 16-bit by stuttering
            *p++ = *p++ = srcPixel.b[srcComponentOffset[0]];
            *p++ = *p++ = srcPixel.b[srcComponentOffset[1]];
            *p++ = *p++ = srcPixel.b[srcComponentOffset[2]];

            write32(outPixel.chunk[0]);
            write32(outPixel.chunk[1]);
        }

        //end frame (at least one additional bit per LED, high)
        for (uint8_t drain = (numPixels >> 5) + 1; drain > 0; drain--) {
            write32(-1);
        }
    }

private:
    inline void write32(uint32_t v) {
#ifdef ESP8266
        while(SPI1CMD & SPIBUSY) {}
        SPI1W0 = v;
        SPI1CMD |= SPIBUSY;
#endif
#ifdef ESP32

        //as usual, default transfer blocks for sending, and has a lot of redundancies
//        SPI.transfer32(v);

        while(spiDev->cmd.usr);
        spiDev->data_buf[0] = v;
        spiDev->cmd.usr = 1;
        //don't do this since I turned off MISO and full duplex
        //data = spi->dev->data_buf[0];
#endif

    }
    uint8_t srcComponentOffset[3];   //see setColorOrder(outputOrder)
};

#endif //NS108ADAPTER_HPP

jonct commented 2 years ago

These should have been gists all along. 🧐

I now have write64 working in one shot (though only tested on ESP8266 so far) so it should get more asynchronous work done between frames.

I've factored the platform-specific #ifdefs into a SemiAsyncSPI.h and reduced NS108Adapter.h and Apa102Adapter.h more-or-less to munching pixel data. There's still immense DRY to deal with, if both are to be built into one larger aggregate image.

This sketch amuses my family while the NS108 strip is almost entirely still on its spool. 🌀

#include "NS108Adapter.h"

#define NUM_PIXELS 150

NS108Adapter strip;

void setup() {
    strip.begin(24000000L);
}

void loop() {
    for (int counter = 0; counter < NUM_PIXELS + 50; counter++) {
        strip.show(NUM_PIXELS, [counter](uint16_t index, uint8_t rgbv[]) {
            //if color order is set right, this should show a bright red pixel on dim green background
            if (index % 100 == counter % 100) {
                rgbv[0] = 100-index;
            } else {
                rgbv[1] = 1;
            }
            rgbv[3] = 0;  //nighttime; low power; low heat
        });
        delay(20);
    }
}

simap commented 1 year ago

It was added! WS2816 as well. See this post for more details: https://forum.electromage.com/t/release-v3-47-new-hdr-led-drivers-faster-ws2812-and-editor-improvements/3045

simap / pixelblaze

NS108/16bit support feasible? #13