greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.06k stars 501 forks source link

Raspberry Pi Pico Arduino Core support #313

Closed savejeff closed 3 years ago

savejeff commented 3 years ago

I was able to use the SdFat lib with the Arduino Pi Pico using only slight modifications in SdFatConfig.h with USE_SIMPLE_LITTLE_ENDIAN set to 0

with this code change:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ && !defined(__SAMD21G18A__)\
  && !defined(__MKL26Z64__) && !defined(ESP8266) && !defined(__RP2040_H__)
#define USE_SIMPLE_LITTLE_ENDIAN 1
#else  // __BYTE_ORDER_
#define USE_SIMPLE_LITTLE_ENDIAN 0
#endif  // __BYTE_ORDER_

and SPI Clock speed at 12Mhz, the card gets initialized with no problem.

Pins: SCK - 2 MOSI - 3 MISO - 4 CS - 5

Would it be possible to change the if-define structure to support the official Arduino core for rpi-pico? my code is based on the external SPI driver example with almost no modifications

// An example of an external SPI driver.
//
#include "SdFat.h"
#include "SPI.h"  // Only required if you use features in the SPI library.

#if SPI_DRIVER_SELECT == 3  // Must be set in SdFat/SdFatConfig.h

// SD chip select pin.
#define SD_CS_PIN 5

// This is a simple driver based on the the standard SPI.h library.
// You can write a driver entirely independent of SPI.h.
// It can be optimized for your board or a different SPI port can be used.
// The driver must be derived from SdSpiBaseClass.
// See: SdFat/src/SpiDriver/SdSpiBaseClass.h
class MySpiClass : public SdSpiBaseClass {

public:
    // Activate SPI hardware with correct speed and mode.
    void activate() {
        SPI.beginTransaction(m_spiSettings);
    }
    // Initialize the SPI bus.
    void begin(SdSpiConfig config) {
        (void)config;
        SPI.begin();
    }
    // Deactivate SPI hardware.
    void deactivate() {
        SPI.endTransaction();
    }
    // Receive a byte.
    uint8_t receive() {
        return SPI.transfer(0XFF);
    }
    // Receive multiple bytes.
    // Replace this function if your board has multiple byte receive.
    uint8_t receive(uint8_t* buf, size_t count) {
        for (size_t i = 0; i < count; i++) {
            buf[i] = SPI.transfer(0XFF);
        }
        return 0;
    }
    // Send a byte.
    void send(uint8_t data) {
        SPI.transfer(data);
    }
    // Send multiple bytes.
    // Replace this function if your board has multiple byte send.
    void send(const uint8_t* buf, size_t count) {
        for (size_t i = 0; i < count; i++) {
            SPI.transfer(buf[i]);
        }
    }
    // Save SPISettings for new max SCK frequency
    void setSckSpeed(uint32_t maxSck) {
        m_spiSettings = SPISettings(maxSck, MSBFIRST, SPI_MODE0);
    }

private:
    SPISettings m_spiSettings;
} mySpi;

#define SD_CONFIG SdSpiConfig(SD_CS_PIN, DEDICATED_SPI, SD_SCK_MHZ(12), &mySpi)
SdFat sd;

//------------------------------------------------------------------------------
void setup() {
    // Open serial communications and wait for port to open:
    Serial.begin(115200);
    delay(1000);
    while(!Serial.available())
    {
        delay(500);
        Serial.print(".");
    }

    pinMode(SD_CS_PIN, OUTPUT);
    digitalWrite(SD_CS_PIN, HIGH);

    SPI.begin();

    Serial.println("Starting");
    if (!sd.begin(SD_CONFIG)) {
        Serial.println("SD Failed");
        sd.initErrorHalt(&Serial);
    }
    Serial.println("Init done ");
    sd.ls(&Serial, LS_SIZE);
    Serial.println("ls done ");
}
//------------------------------------------------------------------------------
void loop() {}
#else  // SPI_DRIVER_SELECT
#error SPI_DRIVER_SELECT must be three in SdFat/SdFatConfig.h
#endif  // SPI_DRIVER_SELECT
greiman commented 3 years ago

You are not using the current version of SdFat. The USE_SIMPLE_LITTLE_ENDIAN part of SdFatConfig. h should work.

Please try version 2.0.7. It has this new section.

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__\
  && (defined(__AVR__) || defined(__ARM_FEATURE_UNALIGNED))
#define USE_SIMPLE_LITTLE_ENDIAN 1
#else  // __BYTE_ORDER_
#define USE_SIMPLE_LITTLE_ENDIAN 0
#endif  // __BYTE_ORDER_
savejeff commented 3 years ago

ah okay i started this project a while ago and I didn't see any mention of added support for Raspberry Pi Pico in the latest release notes.

greiman commented 3 years ago

This mod is not specific to the Pico. It fixes all boards with processors that requires aligned memory access including all Cortex M0 boards.

The release was labeled:

Fix Cortex-M0 Hard Fault

savejeff commented 3 years ago

ah i understand. good work with the library in general. SD Card implementations of individual vendors and HALs are often sub standard. I'm regularly struggling with the ESP32 SPI SD Card lib. I'll use this lib from now on if possible

greiman commented 3 years ago

You can get a huge speed increase by using the SdFat SPI driver and editing SdFatConfig.h like this: You will get a variable dimension array warning. I will fix this in the next release.

/**
 * If USE_SPI_ARRAY_TRANSFER is non-zero and the standard SPI library is
 * use, the array transfer function, transfer(buf, size), will be used.
 * This option will allocate up to a 512 byte temporary buffer for send.
 * This may be faster for some boards.  Do not use this with AVR boards.
 */
#ifndef USE_SPI_ARRAY_TRANSFER
#define USE_SPI_ARRAY_TRANSFER  1  //0
#endif  // USE_SPI_ARRAY_TRANSFER

bench example before:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
143.51,3580,3560,3563
143.50,3581,3561,3563

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
143.07,3584,3572,3574
143.07,3576,3572,3574

after:

write speed and latency
KB/Sec,usec,usec,usec
1933.49,278,258,260
1931.25,278,259,260

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
1934.98,265,258,259
1934.98,265,258,259
savejeff commented 3 years ago

ah thats a great tip. with 2MB/s that's in the region of MMC SD Card connections. Ill benchmark this the next day. also didn't try to increase the SPI Clock speed yet.

greiman commented 3 years ago

I did some tests with a logic analyzer and the data sheet for the Pico. The SPI clock speeds are 31.25/(1 + n) MHz where 0 <= n <= 255.

The reason is the RP2040 clock is 125 MHz and the SPI prescalar is four so the max rate is 125/4 = 31.25 MHz.

The ARM PrimeCell SSP controller used in the RP2040 is far better than the controller in other Cortex M0+ chips like SAMD21.

savejeff commented 3 years ago

Thanks!

I just tried what you recommended and it worked flawlessly


4923: Write: i=0, len=4096, dt=57, rate=71.86
4927: Write: i=1, len=4096, dt=3, rate=1365.33
4933: Write: i=2, len=4096, dt=6, rate=682.67
4937: Write: i=3, len=4096, dt=3, rate=1365.33
4941: Write: i=4, len=4096, dt=2, rate=2048.00
4945: Write: i=5, len=4096, dt=3, rate=1365.33
4948: Write: i=6, len=4096, dt=2, rate=2048.00
4952: Write: i=7, len=4096, dt=3, rate=1365.33
4955: Write: i=8, len=4096, dt=3, rate=1365.33
4959: Write: i=9, len=4096, dt=4, rate=1024.00
4962: Write: i=10, len=4096, dt=3, rate=1365.33
4966: Write: i=11, len=4096, dt=4, rate=1024.00
4969: Write: i=12, len=4096, dt=3, rate=1365.33
4973: Write: i=13, len=4096, dt=4, rate=1024.00
4976: Write: i=14, len=4096, dt=3, rate=1365.33
4980: Write: i=15, len=4096, dt=3, rate=1365.33
4983: Write: i=16, len=4096, dt=2, rate=2048.00
4987: Write: i=17, len=4096, dt=3, rate=1365.33
4991: Write: i=18, len=4096, dt=2, rate=2048.00
4994: Write: i=19, len=4096, dt=3, rate=1365.33
4998: Write: i=20, len=4096, dt=2, rate=2048.00
5002: Write: i=21, len=4096, dt=3, rate=1365.33
5005: Write: i=22, len=4096, dt=3, rate=1365.33
5009: Write: i=23, len=4096, dt=3, rate=1365.33
5012: Write: i=24, len=4096, dt=3, rate=1365.33

is there a specific reason why SPI Array Transfer is limited to the Standart Library implementation? The SdSpiBaseClass has a function for buffer send and receive. Would be nice to make use of the hugely improved speed when using an external SPI driver. I'm working on a cross-platform Sensor Node Framework and I already have a standardized SPI Class similar to the SdSpiBaseClass and would prefer to write a wrapper to connect my SPI Class with the SdFat lib. That would make integration of SdFat much easier for me

greiman commented 3 years ago

You can do anything you want with your external driver. Put your SPI class in place of the standard library SPI. You just need to provide the member functions in SdSpiBaseClass.

I used the standard SPI library in the example since it works with all boards.

I even mix my code with standard library code. See my Due DMA driver.

savejeff commented 3 years ago

Do you mean by forking the lib, setting SPI_DRIVER_SELECT = 0, and implement an "optimized custom SPI driver"? I'll look into that. I assume the Due DMA Driver is an example of an optimized custom SPI driver for the Arduino Due?

I tried the external SPI Driver with SPI_DRIVER_SELECT = 3. here USE_SPI_ARRAY_TRANSFER set 1 did not change the transfer rate (as expected and documented in the code comment in SdFatConfig.h). Is there a specific reason why USE_SPI_ARRAY_TRANSFER is only effective with the standard SPI library?

While benchmarking the write performance I noticed frequent jumps in write times of about 400ms between see regular 2-5ms writes. These are nothing new to me. they vary between different SD card models and I think this is due to the sd card internal controller changing pages. is there a way to register that and do some other work and come back later when the sd card is ready again. on the ESP32 I could avoid this by using the two core splitting logging and sensor data collection but I'm still unable to consistently read sensor data with higher sample rates on single cores like SAMDs and AVRs. Would love to get some insight into what exactly happens there and possible solutions.

regardless, Thanks for the help so far!

greiman commented 3 years ago

I tried the external SPI Driver with SPI_DRIVER_SELECT = 3. here USE_SPI_ARRAY_TRANSFER set 1 did not change the transfer rate (as expected and documented in the code comment in SdFatConfig.h). Is there a specific reason why USE_SPI_ARRAY_TRANSFER is only effective with the standard SPI library?

Just modify your external driver to use array transfer. None of the SdFat spi drivers or wrappers will be called, only your code in your external driver will be used. Just copy my array transfer stuff into your external driver.

You can avoid SD busy by calling isBusy(). You are only guaranteed 512 bytes of write when an SD returns not busy.

You can call file.isBusy() or sd.card()->isBusy(). See about line 368 of ExFatLogger:

    if (!sd.card()->isBusy()) {
      size_t nw = fifoHead > fifoTail ? fifoCount : FIFO_DIM - fifoTail;
      // Limit write time by not writing more than 512 bytes.
      const size_t MAX_WRITE = 512/sizeof(data_t);
      if (nw > MAX_WRITE) nw = MAX_WRITE;

See line 71 of TeensySdioLogger:

    if (n >= 512 && !file.isBusy()) {
      // Not busy only allows one sector before possible busy wait.
      // Write one sector from RingBuf to file.
      if (512 != rb.writeOut(512)) {
        Serial.println("writeOut failed");
        break;
      }
    }
greiman commented 3 years ago

You can use threads with Arduino RP2040 since it's based on mbed. I am experimenting with a low priority thread to write to the SD and a higher priority thread to read sensors. I use the SdFat RingBuf class to buffer data between threads.

savejeff commented 3 years ago

Just modify your external driver to use array transfer. None of the SdFat spi drivers or wrappers will be called, only your code in your external driver will be used. Just copy my array transfer stuff into your external driver.

Wupfs that was an obvious oversite from my side.

thanks I'll test the 512-byte method with Bussy checking. Very helpful

WIth respect to the RP2040 thread method, if not yet very familiar with mbed. is it possible to attach threads directly to one of the cores of the RP2040? On the ESP32 i implemented something similar. I use two tasks assigned to each hardware core. one task to read sensors and write to a global ring buffer with a mutex. the other core reads from the ringbuffer and writes to a file. Works quite well but not everything works completely as expected. for example, the hardware I2C Bus is stalled while SD Card communication. As far as ive heard the mbed is not exactly designed for multicore applications with little to no thread-safe components but that might be wrong.

greiman commented 3 years ago

Raspberry Pi and ARM developed the RP2040 using ARM designed cores and peripherals for SPI, I2C, Serial, etc.

mbed is a true full feature preemptive priority base RTOS developed by ARM. Arduino is working with ARM on the port to RP2040 and it runs the core of the Arduino board package.

I think the Arduino runs on one core for users and the second core may be for the system things on the Arduino Nano 2040.

You should not need a mutex for a ring buffer on the 32-bit RP2040 core or ESP32. You just need a volatile 32-bit variable. Look for a thread safe implementation of a single producer single consumer non-blocking queue.

I use disable interrupts for access the the 32-bit variable in my RingBuf since AVR does not have true 32-bit fetch store. With true 32-bit variables it would be thread safe with multi-processors.

greiman commented 3 years ago

RTOS support for multi-core mpus is difficult because most like the ESP32 have poor separation of I/O and interrupts from cores. That's probably why you have glitches on the ESP32.

However, many processors (including the ESP32) require that the interrupt service routine (ISR) runs in the core that sets up the interrupt.

So if you pin a task to a core you will get glitches.

greiman commented 3 years ago

This makes the Arduino Core plug-and-play, and an easy choice for getting your devices up and running quickly. We provide two cores; one for our Nano RP2040 Connect board, and one for other RP2040-based boards, including the Raspberry Pi Pico. As the core is based on Mbed OS you can choose between using Arduino or Mbed’s API.

greiman commented 3 years ago

Looks like you can use the Pi Pico SDK to mix multicore with mbed. Who knows how to safely use it.

#include "pico/multicore.h"
#include <atomic>
std::atomic<uint16_t> data;
std::atomic<bool> flag {false};

void core_entry() {
  uint16_t n = 0;
  while (true) {
    while (flag.load()) {}
    data.store(n++);
    flag.store(true);
  }
}

void setup() {
  Serial.begin(9600);
  while (!Serial) {}
  multicore_launch_core1(core_entry);
}

void loop() {
  while (!flag.load()) {}
  Serial.println(data.load());
  flag.store(false);
  delay(500);
}

output:

0
1
2
savejeff commented 3 years ago

RTOS support for multi-core mpus is difficult because most like the ESP32 have poor separation of I/O and interrupts from cores. That's probably why you have glitches on the ESP32.

I try to avoid interrupts and limit myself to a single task per hardware core. i was expecting the SPI and I2C would operate independently but that turned out to be false. i solves it by implementing I2C in software. individual pin control seems to me unaffected by SPI communications on another core.

i only use process synchronization when I change the read and write marker of the ringbuffer. write and read work concurrently.

when I find the time I'll look into mbed. it seems to be a promising project.

savejeff commented 3 years ago

i already tried "multicore_launch_core1(core_entry);" but the process got stuck when using the delay function.

there is an alternative Arduino core on github: earlephilhower/arduino-pico. it supports multitasking by just adding setup1() and loop1(). looks good but I would like to avoid non official cores as longtime support not guaranteed. i hope multicore support will get improved in the official core without mixing to much Pico SDK, embed and Arduino core.

greiman commented 3 years ago

Delay does an mbed sleep. Don't use any mbed calls in core1 code.

I am playing with core1 reading the adc at a regular interval and queuing data to core0. I use the Pi Pico SDK function time_us_32() to delay. I have a tight loop waiting for the next time to read the adc. The Pi Pico adc_read only takes about 2 usec. Almost 500,000 sps.

void core_entry() {
  adc_init();
  // Make sure GPIO is high-impedance, no pullups etc
  adc_gpio_init(26);
  // Select ADC input 0 (GPIO26)
  adc_select_input(0);
  uint32_t m = time_us_32();
  while (true) {
    m += 50;    // 20,000 sps
    while (m > time_us_32()) {}  
    uint16_t tmp = adc_read();
    // queue tmp here.
  }
}   

The above delay needs to fixed for roll-over of us_32. I should have used a signed difference or used us_64.

I think I will use 512 byte blocks for the adc data and queue blocks. Should be able to do more than 100k sps to SD.

The Pi guys are too good to ignore. The Arduino analogRead take over 20 usec.

savejeff commented 2 years ago

I know its a little bit late, hat a lot of stuff going on.

About the wait, there is the busy_wait_us function. it works on both cores. it's a little bit wasteful to do busy wait but better than a crash I guess.

have you made progress with the analog read on the second core? I found there is some queue code in the Pico SDK multicore_runner

greiman commented 2 years ago

I have given up on use of the Pico ADC. The RP2040 ADC is useless for many apps since it has a fundamental design problem. The caps in the SAR ADC were designed with the wrong values. See this section of the data sheet.

RP2040ADCerrata

savejeff commented 2 years ago

that's unfortunate. I was waiting for the first problem with the chip. No bugs in the first iteration were a little bit too unbelievable.