greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.08k stars 503 forks source link

ESP32 fails with dedicated SPI and Task #349

Open nseidle opened 2 years ago

nseidle commented 2 years ago

Hi Bill,

I greatly appreciate all your work on SdFat over the past decade. It's been great to watch and a pleasure to use.

I've got an ESP32 problem that I can't quite crack. When using dedicated SPI with SdFat v2.1.2, the ESP32 reboots with a curious failure mode:

assertion "pxQueue->pcHead != ((void *)0) || pxQueue->u.xSemaphore.xMutexHolder == ((void *)0) || pxQueue->u.xSemaphore.xMutexHolder == xTaskGetCurrentTaskHandle()" failed: file "IDF/components/freertos/queue.c", line 782, function: xQueueGenericSend

Decoding the stack trace shows the FatFile is actively trying to write right before things blow up.

image

When using SHARED_SPI, everything works great but the transfer rates are much slower, hence why I'm here.

sd.begin(SdSpiConfig(pin_microSD_CS, DEDICATED_SPI, SD_SCK_MHZ(16))); //Fails
//sd.begin(SdSpiConfig(pin_microSD_CS, SHARED_SPI, SD_SCK_MHZ(16))); //Works

The code to reproduce is below. I've tried to reduce it as much as possible.

Thoughts:

If you don't have any guesses or thoughts, please let me know. I'll go ask the ESP32 folks but I wanted to see if you have any insights first.

Cheers, -Nathan

int pin_microSD_CS = 25;

#include <SPI.h>
#include "SdFat.h"

SdFat sd;
SdFile mainFile;
char mainFileName[40] = "mainFile.txt";

TaskHandle_t writeTaskHandle = NULL;
const uint8_t writeTaskPriority = 0; //3 being the highest, and 0 being the lowest
const int writeTaskStackSize = 10000;

void setup()
{
  Serial.begin(115200);
  delay(500);
  Serial.println("Go");

  sd.begin(SdSpiConfig(pin_microSD_CS, DEDICATED_SPI, SD_SCK_MHZ(16))); //Fails
  //sd.begin(SdSpiConfig(pin_microSD_CS, SHARED_SPI, SD_SCK_MHZ(16))); //Works

  Serial.print("Creating main file - ");

  if (sd.exists(mainFileName))
    sd.remove(mainFileName);

  if (mainFile.open(mainFileName, O_RDWR | O_CREAT | O_TRUNC) == true)
    Serial.println("Success!");
  else
    Serial.println("Fail");

  if (writeTaskHandle == NULL)
    xTaskCreatePinnedToCore(
      writeToSDTask,
      "WriteSD", //Just for humans
      writeTaskStackSize, //Stack Size
      NULL, //Task input parameter
      writeTaskPriority, //Priority
      &writeTaskHandle, //Task handle
      //0); //Core where task should run, 0=FreeRTOS, 1=Arduino
      1); //Core where task should run, 0=FreeRTOS, 1=Arduino
}

void loop()
{
  delay(10);
}

void writeToSDTask(void *e)
{
  while (true)
  {
    mainFile.write("File write\r\n");

    delay(1);
    taskYIELD();
  }
}
greiman commented 2 years ago

I suspect the panic is due to a an attempt to unlock a mutex that has not been locked. in ESP32 V2.0.

The newest version I get with the board manger is 1.06. That version has calls to lock and unlock the SPI mutex but doesn't crash.

void spiEndTransaction(spi_t * spi)
{
    if(!spi) {
        return;
    }
    SPI_MUTEX_UNLOCK();  <<---- place crash appears to happen in V2.00.
}

I modified your example like this to write 10,000 lines to the file.

void writeToSDTask(void *e)
{
  int i = 0;
  while (true)
  {
    if (i++ == 10000) {
      Serial.println("done");
      mainFile.close();
    }
    mainFile.write("File write\r\n");

    delay(1);
    taskYIELD();
  }

and it runs on V 1.06.

Go
Creating main file - Success!
done

I suspect there could be a problem in SdFat since I did a major rewrite of how shared/dedicated SPI works.

I will add code to verify that calls to beginTransaction() match calls to endTransaction() and do some tests.

greiman commented 2 years ago

I tried V2.0.1 and it has a problem. Try running this test. It won't run on my ESP32 with SHOW_BUG set to one but will with SHOW_BUG set zero.

This means that dedicated SPI is not practical on ESP32 V2.

#include <SPI.h>
#define SHOW_BUG 1
TaskHandle_t writeTaskHandle = NULL;
const uint8_t writeTaskPriority = 0; //3 being the highest, and 0 being the lowest
const int writeTaskStackSize = 10000;

void setup()
{
  Serial.begin(115200);
  delay(500);
  Serial.println("Go");
  delay(500);
  SPI.begin();
  SPI.beginTransaction(SPISettings(16000000, MSBFIRST, SPI_MODE0));
  SPI.transfer(0XFF);
  #if !SHOW_BUG
  SPI.endTransaction();
  #endif

  if (writeTaskHandle == NULL)
    xTaskCreatePinnedToCore(
      writeToSDTask,
      "WriteSD", //Just for humans
      writeTaskStackSize, //Stack Size
      NULL, //Task input parameter
      writeTaskPriority, //Priority
      &writeTaskHandle, //Task handle
      //0); //Core where task should run, 0=FreeRTOS, 1=Arduino
      1); //Core where task should run, 0=FreeRTOS, 1=Arduino
}

void loop()
{
  delay(10);
}

void writeToSDTask(void *e)
{
  while (true)
  {
#if !SHOW_BUG
    SPI.beginTransaction(SPISettings(16000000, MSBFIRST, SPI_MODE0));
#endif
    SPI.transfer(0XFF);
    SPI.endTransaction();
#if SHOW_BUG
    SPI.beginTransaction(SPISettings(16000000, MSBFIRST, SPI_MODE0));
#endif    
    delay(1);
    taskYIELD();
  }
}

Your example will run on V2.0.1 if I put this line mainFile.sync(); after the open call and after mainFile.write("File write\r\n");

The sync() call ends the SPI transaction.

nseidle commented 2 years ago

Ah, got it. Thanks. I suspect they dramatically changed the SPI driver between v1.6 and v2, similar to how their new I2C driver broke code. I'll report it and see if their new driver can support dedicated transfers.

Nanowires commented 2 years ago

Hi, I just want to add my two cents to this thread:

I have two concurrent task using the sd card. One as an FtpServer, for reading writing the sd card for the ftp clients and another task reading the sd card (for playing music).

Now I've encountered two problems: The first is the same as mentioned in this thread. While using the card in dedicated SPI mode (even defined the ENABLE_DEDICATED_SPI to 0 in SdFatConfig.h) I still get sometimes the assert error:

assert failed: xQueueGenericSend queue.c:832 (pxQueue->pcHead != ((void )0) || pxQueue->u.xSemaphore.xMutexHolder == ((void )0) || pxQueue->u.xSemaphore.xMutexHolder == xTaskGetCurrentTaskHandle())

The other problem is, that sometimes during the concurrent accessing of the card, I encounter a deadloop, until the watchdog triggers

I'm using an ESP32 as hardware.

Here's a snipped for reproduction:

#define SPI_CLOCK SD_SCK_MHZ(16)
#define SD_CONFIG SdSpiConfig(SS, SHARED_SPI, SPI_CLOCK)

static SdFs sd;

static void testTask( void *pData )
{
    Serial.printf( "Delay testTask\n" );
    delay( 100 );
    Serial.printf( "Starting testTask\n" );
    for (int i = 0; i < 1000; i++ )
    {
        FsFile root;
        root.open( "/", O_RDONLY );
        if (i == 0 && !root ){
            Serial.printf( "Couldn't open root\n" );
        }
        FsFile file;
        do {
            file = root.openNextFile( );
            file.close( );
            Serial.printf( "Test testTask\n" );
        } while ( file );
        delay(10);
        root.close();
    }
    Serial.printf( "Test testTask done\n" );
    vTaskDelete( NULL );
}

static void testSd()
{
      if ( !sd.begin( SD_CONFIG ) ) {
          Serial.printf( "Card Mount Failed\n" );
      }

    FsFile file;

    file.open( "/someFilePath", O_RDONLY );
    if ( !file ) {
        Serial.printf( "Couldn't open file\n" );
    } else {
        if ( pdPASS != xTaskCreate( testTask, /* Task function. */
            "TaskTest", /* String with name of task. */
            3000, /* Stack size in bytes. */
            NULL, /* Parameter passed as input of the task */
            1, /* Priority of the task. */
            NULL ) /* TaskHandle */
            ) {
                DebugFatal( );
            }
        for ( int i = 0; i < 1000; i++ ) {
            char buf[2048];
            file.read( buf, sizeof(buf) );
            file.seek( 0 );
            delay( 10 );
            Serial.printf( "Test mainTask\n" );
        }
        file.close( );
        Serial.printf( "Test mainTask done\n" );
    }
}
greiman commented 2 years ago

I have two concurrent task using the sd card. One as an FtpServer, for reading writing the sd card for the ftp clients and another task reading the sd card (for playing music).

You can't access the SD concurrently from two tasks unless you protect access with a mutex.

See this for an example of a FreeRTOS mutex that is used with a shared resource. Any access to the SD must be done while the task has acquired the mutex protecting access to the SD.

Nanowires commented 2 years ago

Ah, thanks for the clarification.

LeedanJohnson commented 2 years ago

Can I get some further clarification on this? Am I able to safely have two File32 objects referencing open files concurrently if I am careful to never write/read concurrently, or must I open/close them between writes/reads to ensure that I never have two files open at the same time.

Thanks for your time, and all your work in building/maintaining.

nseidle commented 2 years ago

I don't want to speak for Bill but...

Am I able to safely have two File32 objects referencing open files concurrently if I am careful to never write/read concurrently, or must I open/close them between writes/reads to ensure that I never have two files open at the same time.

In short, yes. Be sure to checkout what a mutex is. Using one will allow you to have many files open. The trick is avoiding two tasks from attempting to utilize the SPI hardware at the same time. One task will corrupt the other task's SPI transaction and everything will be bad. Not that my code is any gold standard, but you can see an example of using a semaphore here.

LeedanJohnson commented 2 years ago

Perfect @nseidle, I'm familiar with mutex/semaphores, just needed to know at what level I needed to be restricting sd access. Thanks for your input!

skobkars commented 2 years ago

I am having the same crashes on ESP32-S2 while accessing SdFat files through SdFat32 and providing access to it through TinyUSB as Mass Storage Device. I am using SdFat32->card() to read and write data on the USB host request and get

assert failed: xQueueGenericSend queue.c:832 (pxQueue->pcHead != ((void *)0) || pxQueue->u.xSemaphore.xMutexHolder == ((void *)0) || pxQueue->u.xSemaphore.xMutexHolder == xTaskGetCurrentTaskHandle())

on collisions (?)

The only more or less working solution is to disable HAL locks in ESP32-S2 core (CONFIG_DISABLE_HAL_LOCKS=1). Then there are no crashes, but eventually I am getting a dead lock or something, as neither my ESP32-S2 device, nor USB host get any data from SdFat, but loops() are still running.

I tried using SHARED_SPI fix, but it changes nothing.

So, to summarize, there is only one device on SPI and only SdFat has access to it, but both on SD card (sector-base) and filesystem-base levels.

greiman commented 2 years ago

Could be a bug where SdFat accesses the SPI bus when not in a "SPI transaction".

I have a macro that checks for this in src/SdCard/SdSpiCard.h at about line 36:

/** Verify correct SPI active if non-zero. */
#define CHECK_SPI_ACTIVE 0
#if CHECK_SPI_ACTIVE
/** Check SPI active. */
#define SPI_ASSERT_ACTIVE {if (!m_spiActive) {\
  Serial.print(F("SPI_ASSERTACTIVE"));\
  Serial.println(__LINE__);}}
#else  // CHECK_SPI_ACTIVE
/** Do not check SPI active. */
#define SPI_ASSERT_ACTIVE
#endif  // CHECK_SPI_ACTIVE

Set #define CHECK_SPI_ACTIVE to one and see if SPI_ASSERT_ACTIVE finds a problem.

It will print to Serial.

skobkars commented 2 years ago

Thank you Bill,

It crashes in spiSend(). My code is intentionally quite busy for the SdFat where I am listing root directory every second while USB is getting connected (the code is below).

I am currently using a fork of EspTinyUSB adapted for SdFat 2.x: EspTinyUSB, but the exact same issue was present when I started playing with Adafruit TinyUSB + Adafruit SdFat fork, and also later, when I reworked Adafruit TinyUSB to work with the current version of SdFat. So, the issue has been around for quite a long time.

CONFIG_DISABLE_HAL_LOCKS helps with ESP32-S2, but I think this is only because there is a single core. I'm not sure if it would help with other ESP32's

Here, it crashed after the second loop:

ESP-ROM:esp32s2-rc4-20191025
Build:Oct 25 2019
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3ffe6100,len:0x524
load:0x4004c000,len:0xa70
load:0x40050000,len:0x292c
SHA-256 comparison failed:
Calculated: 8dce998efc3eb6dc43f3799a489dbcfbd5ada9828beadf71017c05c4a8d32bc3
Expected: ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
Attempting to boot anyway...
entry 0x4004c18c
[  1130][I][esp32-hal-psram.c:96] psramInit(): PSRAM enabled
[  1376][D][usb_descriptors.cpp:45] getConfigurationDescriptor(): descriptor length: 32

[  1376][D][esptinyusb.cpp:53] esptinyusbtask(): [esptinyusbtask] USB tud_task created
[  1380][D][msc_usb.ino:26] setup(): MSC lun 1 begin
[  1385][D][msc_usb.ino:34] setup(): CONFIG_DISABLE_HAL_LOCKS configured
[  2391][D][msc_usb.ino:47] loop(): ==============================================================================================
[  2392][D][msc_usb.ino:67] loop(): test.txt                                                        :        0 2022-07-18 17:05:24
[  2403][D][msc_usb.ino:67] loop(): test2.txt                                                       :        0 2022-07-18 17:05:30
[  2414][D][msc_usb.ino:67] loop(): test3.txt                                                       :        0 2022-07-18 17:05:32
[  3443][D][msc_usb.ino:47] loop(): ==============================================================================================
[  3444][D][msc_usb.ino:67] loop(): test.txt                                                        :        0 2022-07-18 17:05:24
[  3456][D][msc_usb.ino:67] loop(): test2.txt                                                       :        0 2022-07-18 17:05:30
[  3467][D][msc_usb.ino:67] loop(): test3.txt                                                       :        0 2022-07-18 17:05:32
[  3480][D][SdSpiCard.h:326] spiSend(): SPI_ASSERTACTIVE
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x40007e30  PS      : 0x00060330  A0      : 0x80099d1c  A1      : 0x3ffca8b0  
A2      : 0x00000146  A3      : 0x00000144  A4      : 0x000000ff  A5      : 0x0000ff00  
A6      : 0x00ff0000  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x00000000  
A10     : 0x00000000  A11     : 0x3f000a5c  A12     : 0x00000000  A13     : 0x00000000  
A14     : 0x30000000  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000144  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x40027898  

Backtrace:0x40007e2d:0x3ffca8b00x40099d19:0x3ffca8c0 0x4009e906:0x3ffcabd0 0x4009e942:0x3ffcac60 0x40086a96:0x3ffcaca0 0x40083ea9:0x3ffcad00 0x400841c4:0x3ffcad30 0x400844d0:0x3ffcad60 0x400845f0:0x3ffcad80 0x4008463e:0x3ffcada0 0x400847f7:0x3ffcadc0 0x40082c4c:0x3ffcade0 0x40082d06:0x3ffcae10 0x40082f2a:0x3ffcae40 0x4008187c:0x3ffcae70 0x40087211:0x3ffcaf10 

Sketch:

/**
 * Simple MSC device with SdFat32USB file system on SD card
 */

#include "sdfatusb.h"

#if CFG_TUD_MSC

#define CS 15

SDFat2USB SdFat32USB;

File32 root;
File32 file;

void setup() {
  if( SdFat32USB.initSD( CS ) ) {
    if( SdFat32USB.begin() ) {
      log_d("MSC lun 1 begin");

    } else log_e("LUN 1 failed");

  } else log_d("Failed to init SD");

#if CONFIG_DISABLE_HAL_LOCKS
  log_d("CONFIG_DISABLE_HAL_LOCKS configured");
#else
  log_d("CONFIG_DISABLE_HAL_LOCKS is not configured");
#endif

}

void loop()
{
  delay(1000);

  if( SdFat32USB.isReady() ) {
      log_d("==============================================================================================");

      uint16_t mdate, mtime;
      char nm[64];
      uint32_t sz;

      root.open("/");
      while( file.openNext(&root, O_RDONLY) ) {

        if( file.isFile() && !file.isHidden() && file.getModifyDateTime(&mdate, &mtime) ) {
          file.getName(nm, 64);

          byte yy =   mdate >> 9;
          byte mm = ( mdate & 0b0000000111100000 ) >> 5;
          byte dd =   mdate & 0b11111;

          byte hr =    mtime >> 11;
          byte mn = (  mtime & 0b0000011111100000 ) >> 5;
          byte sc = (( mtime & 0b11111 ) << 1); // + dir.creationTimeTenths / 100.0;

          log_d("%-64s: %8u %4d-%02d-%02d %02d:%02d:%02d", nm, file.fileSize(), 1980 + yy, mm, dd, hr, mn, sc);

        }
        file.close();
      }
      root.close();
  }

}

#endif
greiman commented 2 years ago

I suspect SdFat will not work with EspTinyUSB. SdFat is not thread safe.

The ESP32 port of FatFs depends on a mutex in SPI.beginTransaction() and SPI.endTransaction() at the diskio level and another mutex at the file system level.

greiman commented 2 years ago

The idea of accessing files simultaneously on an SD by another computer as an msc device and on the ESP32 with the system SD library or SdFat won't work reliably.

Android, IOS, Linux, macOS, and Windows all cache sections of the SD. Even if you don't write to the files, directory entries get modified with last access times so if a another file is modified on the ESP32 and it's directory entry is in a page cached on the other computer, changes will be overwritten. There are many other conflicts like the File Allocation Table.

I was slightly involved with the BSD Unix NFS, Network File System, during development at UCB and Sun Microsystems around 1984.

NFS and all following systems like Windows Server Message Block (SMB) protocol use remote procedure calls. There is a single process accessing the disk and clients send requests to open files, read/write and other operations.

The the first proposed standard for NFS was published in 1989 and outlines the protocol in simplified form:

https://datatracker.ietf.org/doc/html/rfc1094

skobkars commented 2 years ago

Yes, I understand limited usability of the "simultaneous" access.

Thankfully, the project I am working on is a bit different and I do not really need to access files at the same time and can wait a bit instead.

I am dealing with a flight equipment that allows downloading flight tracks to a plugged in mass storage USB device. As I need it do be done on multiple aircrafts multiple times a day, it is a bit of a hassle to do that and then to transfer files to a computer, so the idea is that an ESP32 device simulating MSC is plugged in, files are transferred by the equipment and then automatically copied to a network storage over WiFi. So, between "files transferred" and "files beamed up" can be a reasonably long time delay.

It just that I did not expect these crashes and deadlocks, and had to try to find a workaround. I think I can make it work, but understand that this is a very special scenario.

greiman commented 2 years ago

MSC will probably work for your use. MSC was used on Android phones prior to V4.3. These phones had a data volume for sharing and dismounted it when the phone was accessed over a USB cable with MSC.

The modern way in Android and other smart devices is to use MTP, Media Transfer Protocol. It would be great if TinyUSB supported it. MTP is file oriented instead of block oriented so it is safe. The MTP spec is here. It was designed for what you want to do.

skobkars commented 2 years ago

MSC is the only way to go for me unfortunately, as I have no control over the flight recorders and they only share data through MSC, and even that is limited to no-composite devices, only FAT16/FAT32, and storage size no greater than 32GB.

skobkars commented 2 years ago

Or did you mean as MSC over network without storing files on an intermediate devices first? That would be great and that was my first thought, but unfortunately network delays make it impossible. MSC timing is too tight to allow for network latencies I'm afraid. At least I failed to make it work.