greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.07k stars 502 forks source link

How to use SDFat with STM32 SDMMC? #250

Open Bambofy opened 3 years ago

Bambofy commented 3 years ago

Hi, is it possible to use SdFat with the STM32 SDMMC interface?

greiman commented 3 years ago

It would be possible to use the STM32 SDMMC drivers since SdFat supports general block devices. Unless you do huge transfers, performance would be poor.

The problem is moderns SD cards have huge flash pages and RAM buffers and require huge writes and reads for performance. The STM32 drivers are not designed for the Arduino environment.

Here is performance vs transfer size for Teensy 4.1 DMA with the 4-bit SDIO bus. You need 16KiB writes to do better than SPI.

DMA SDIO mode
size,write,read
bytes,KB/sec,KB/sec
512,604.44,2171.22
1024,1137.83,3173.37
2048,2144.13,5453.00
4096,3737.19,8353.10
8192,4329.72,9235.91
16384,7695.38,13993.51
32768,13741.52,16139.17

Here is performance for dedicated SPI mode on a second port of a Teensy 4.1. This provides good but not exceptional performance for all transfers, even smaller than 512 byte.

Dedicated SPI mode.
size,write,read
bytes,KB/sec,KB/sec
512,5166.12,5213.76
1024,5183.89,5220.43
2048,5188.14,5224.18
4096,5177.25,5230.04
8192,5192.19,5225.09
16384,5190.17,5225.83
32768,5193.14,5226.18

The Teensy 4.1 has a 512 byte FIFO in it's SDIO controller that allowed me to write a custom driver. STM32 only has a 128 byte FIFO so I can't use the same method. Here are teensy 4.1 results with my custom SDIO driver. It's fast and by using a large ring buffer I have recorded DMA ADC data at 6 MB/sec or 3 million samples per second.

FIFO SDIO mode.
size,write,read
bytes,KB/sec,KB/sec
512,22241.57,22702.96
1024,22518.24,22735.45
2048,22546.08,22814.16
4096,22236.62,22934.11
8192,22560.51,22824.28
16384,22563.73,22767.48
32768,22569.25,22836.09

I like STM32 but the ST provided SDMMC drivers are impossible to use in a mode like dedicated SPI and the STM32 SDMMC controller's small FIFO and lack of other features prevents me from using the Teensy 4.1 trick.

Bambofy commented 3 years ago

Yes i'm looking to do high speed transfers, so SDMMC might actually be slower?

greiman commented 3 years ago

I forgot one other disadvantage. The SDMMC controller causes the SD to need to copy data from partial filled flash pages so there are occasional writes with many milliseconds of latency while the card erases new flash pages. The kills fast data logging since Cortex M chips don't have huge amounts of RAM for buffers and a multi threaded OS like phones. Moderns SD cards are designed for phones with Android or IOS.

Bambofy commented 3 years ago

I forgot one other disadvantage. The SDMMC controller causes the SD to need to copy data from partial filled flash pages so there are occasional writes with many milliseconds of latency while the card erases new flash pages. The kills fast data logging since Cortex M chips don't have huge amounts of RAM for buffers and a multi threaded OS like phones. Moderns SD cards are designed for phones with Android or IOS.

Thank for you the help it's really appreciated, i'm just creating a test sketch and i'll post the speeds i find too :)

greiman commented 3 years ago

The ST board package has a really slow SPI driver. Here is what I get with a F446RE Nucleo with the bench example and dedicated SPI.

FILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
741.57,706,688,689
742.01,706,688,689

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
745.00,690,686,686
745.00,690,686,686
stevstrong commented 3 years ago

For F4x family it is suggested to use a larger buffer, e.g. 8kB.

greiman commented 3 years ago

8KB won't help, the ST driver does byte at a time SPI and when you look at the signals on a scope there is a huge idle time between bytes.

Here is a 32KB buffer with F446RE Nucleo:

FILE_SIZE_MB = 5
BUF_SIZE = 32768 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
745.84,43983,43909,43924
745.73,43928,43909,43924

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
748.08,43864,43798,43801
748.08,43800,43798,43801

The Arduino SPI wrapper from ST is really bad! I might try to use their multi-byte transfer again. In the past it would,'t work, maybe it's fixed.

Edit: the multi-byte SPI still goes through a slow single byte API. Not a fast DMA transfer. See the LL_SPI calls.

 while (size--) {
#if defined(STM32H7xx) || defined(STM32MP1xx)
    while (!LL_SPI_IsActiveFlag_TXP(_SPI));
#else
    while (!LL_SPI_IsActiveFlag_TXE(_SPI));
#endif
    LL_SPI_TransmitData8(_SPI, *tx_buffer++);

    if (!skipReceive) {
#if defined(STM32H7xx) || defined(STM32MP1xx)
      while (!LL_SPI_IsActiveFlag_RXP(_SPI));
#else
      while (!LL_SPI_IsActiveFlag_RXNE(_SPI));
#endif
      *rx_buffer++ = LL_SPI_ReceiveData8(_SPI);
    }
    if ((Timeout != HAL_MAX_DELAY) && (HAL_GetTick() - tickstart >= Timeout)) {
      ret = SPI_TIMEOUT;
      break;
    }
  }
greiman commented 3 years ago

Here is an Arduino Due with a 512 byte buffer.

ILE_SIZE_MB = 5
BUF_SIZE = 512 bytes
Starting write test, please wait.

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4314.06,12590,110,116
4074.98,11986,110,123

Starting read test, please wait.

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
4549.59,342,109,110
4549.59,343,109,110
Bambofy commented 3 years ago

Have you tried a device with built in SDMMC hardware? https://www.st.com/content/ccc/resource/training/technical/product_training/0d/2d/cc/55/55/a6/42/12/STM32L4_Peripheral_SDMMC.pdf/files/STM32L4_Peripheral_SDMMC.pdf/jcr:content/translations/en.STM32L4_Peripheral_SDMMC.pdf

greiman commented 3 years ago

The first line of my reply was:

It would be possible to use the STM32 SDMMC drivers since SdFat supports general block devices. Unless you do huge transfers, performance would be poor.

There are lots of problems with the STM32 SDMMC controllers. Many variants have serious errata. Even the newest versions in STM32F7 and STM32H7 are way out of date.

The drivers are not enabled in the ST Arduino board support package. See stm32yyxx_hal_conf.h in the STM32 Arduino Core support.

/*
 * Not defined by default
 */
#if !defined(HAL_SD_MODULE_DISABLED)
  /*#define HAL_SD_MODULE_ENABLED*/
#else
  #undef HAL_SD_MODULE_ENABLED
#endif

I have no interest in trying to fix support for SDMMC in the ST Arduino package.

I have lots of STM32 boards and use ChibiOS which has suitable support in it's STM32 HAL for SDMMC. You still can't get reasonable SDIO performance, Not even close to Teensy 4.1 with the hot NXP controller.

stevstrong commented 3 years ago

I suggested 8kB because it was tested with the Libmaple (Roger Clark's) core using DMA.

greiman commented 3 years ago

I remember the tests with Roger Clark's board package. Requiring users to manage 8KiB buffers and insure writes are multiples of four bytes is not a solution. Users like to use Print.

stevstrong commented 3 years ago

There is no restriction to use 4 bytes, the DMA works on byte level. And the non-DMA SPI driver is also highly optimized so it does not (always) have gaps between bytes.

greiman commented 3 years ago

There is no restriction to use 4 bytes, the DMA works on byte level. And the non-DMA SPI driver is also highly optimized so it does not (always) have gaps between bytes.

It's not the DMA count it's the DMA alignment to match the 32-bit data register in the SDMMC.

If you ever use other than a multiple of four bytes in a write to a file or the user give you an unaligned buffer you must copy data for alignment. Most STM32 SDMMC drivers then resort to single block transfers with a memcpy. So it's very slow.

Here is that part of a STM32 SDMMC driver I have used.

#if STM32_SDC_SDMMC_UNALIGNED_SUPPORT
  if (((unsigned)buf & 3) != 0) {
    uint32_t i;
    for (i = 0; i < blocks; i++) {
      memcpy(sdcp->buf, buf, MMCSD_BLOCK_SIZE);
      buf += MMCSD_BLOCK_SIZE;
      if (sdc_lld_write_aligned(sdcp, startblk, sdcp->buf, 1))
        return HAL_FAILED;
      startblk++;
    }
    return HAL_SUCCESS;
  }
#else /* !STM32_SDC_SDIO_UNALIGNED_SUPPORT */
  osalDbgAssert((((unsigned)buf & 3) == 0), "unaligned buffer");
#endif /* !STM32_SDC_SDIO_UNALIGNED_SUPPORT */

The block nature of the SD vs the bucket of bytes model for files and fact that Cortex M SD/MMC controllers are such old designs causes lots of performance problems.

Modern SD cards are designed for huge transfers. The STM32 SDMMC is designed for the 2006-2008 SD standard.

The NXP controller in the Teensy 4.1 is designed for modern cards with 512KiB Record Units. I write any size file with a single write command. I test with a 8GiB write to a file. That's how I achieve over 22 MB/sec with a 512 byte buffer on Teensy.

Edit: I am able to put the NXP controller in write mode and as long as the user does write calls I can continue the write. The user can make single byte write calls and the SD sees a stream of bytes. The SD has huge RAM buffers but the STM32 causes it to program flash. The SD copies the partial flash page for the next write call with the STM32. Data can move at 300MB/sec in a UHS-II card but you see KB/sec through the STM32 controller.

greiman commented 3 years ago

The only reason the NXP chip can only do 22+ MB/sec on Teensy 4.1 is Paul didn't design Teensy 4.1 for UHS-I 4-bit 208 MHz transfers. The controller can support UHS-II cards for up to 200 MB/sec.

Paul probably didn't have space to implement 1.8V signaling and the tuning required to hit 208 MHz.

How about UHS-III cards at 600 MB/sec

https://www.sdcard.org/developers/sd-standard-overview/bus-speed-default-speed-high-speed-uhs-sd-express/

greiman commented 3 years ago

stevstrong

I tried the "optimized multi byte SPI" wrapper for Arduino. It is slow and almost unusable. You must specify both a RX buffer and a TX buffer. So if you want to read from the card you must allocate a TX buffer and fill it with 0XFF since the card looks for commands on it's input. You might be able to hit a MB/sec, a bit better than the 750 KB/sec above.

The multi-byte transfer does not return status so you never know what happened.

void SPIClass::transfer(byte _pin, void *_bufout, void *_bufin, size_t _count, SPITransferMode _mode);

It just returns for a number of errors. I put print statements in it to discover what happened when it returned in 1 microsecond with no signal on my scope.

This is the underlying driver called by the wrapper , it uses 8-bit polled transfers. See the above code I posted after your suggestion to use of a 8KiB buffer.

spi_status_e spi_transfer(spi_t *obj, uint8_t *tx_buffer, uint8_t *rx_buffer,
                          uint16_t len, uint32_t Timeout, bool skipReceive);

I don't think ST has the major league team working on the Arduino Board support package.

greiman commented 3 years ago

I suspect you could go directly in STM32Cube and make a faster SPI driver for SdFat. Here is an example, UserSPIDriver.ino, that shows how.

If you want to go into STM32Cube and use the SDMMC driver write a class that is derived from the BlockDeviceInterface. It's then easy to use it with SdFat. I have an example for a USB key.

You init your driver and then there is a form of SdFat that accepts a pointer to it in the begin call. usbKey is a class with parent BlockDeviceInterface.

  // init usbKey driver.
  if (!initUSB(&usb)) {
    Serial.println("initUSB failed");
    while(1){}
  }
  //not really an SD card
  if (!sd.begin(&usbKey)) {
    Serial.println("sd.begin failed");
    while(1) {}
  }
Bambofy commented 3 years ago

I suspect you could go directly in STM32Cube and make a faster SPI driver for SdFat. Here is an example, UserSPIDriver.ino, that shows how.

If you want to go into STM32Cube and use the SDMMC driver write a class that is derived from the BlockDeviceInterface. It's then easy to use it with SdFat. I have an example for a USB key.

You init your driver and then there is a form of SdFat that accepts a pointer to it in the begin call. usbKey is a class with parent BlockDeviceInterface.

  // init usbKey driver.
  if (!initUSB(&usb)) {
    Serial.println("initUSB failed");
    while(1){}
  }
  //not really an SD card
  if (!sd.begin(&usbKey)) {
    Serial.println("sd.begin failed");
    while(1) {}
  }

I'd love to write the driver but its way over my head, i'm using this library by the guys at STM32Duino made https://github.com/stm32duino/STM32SD i'm going to test out the speed of the SDMMC tomorrow, with it configured to 16MHz but the result are (as you proved before) not significant on preliminary readings (around 40kb/s) but this was pre-clock "tuning". If the SDMMC readings aren't great i'm going to use SdFat SPI mode again cause i got some wicked fast speeds and the hardware support is top

Bambofy commented 3 years ago

Using normal .write() with the SDMMC interface, i can get speeds of 33192B/s, i am just integrating it with the buffers now to see what it is like

greiman commented 3 years ago

I tried another idea for a better SPI driver. I was able to get 1946 KB/sec on a Nucleo F446RE.

Here are the two files. Put both files in a folder named UserStm32SPIDriver in the Arduino folder with other sketches.

Here is the test sketch UserStm32SPIDriver.ino

// An example of an external SPI driver.
//
#include "SdFat.h"
#include "Stm32SpiDriver.h"

#if SPI_DRIVER_SELECT == 3  // Must be set in SdFat/SdFatConfig.h

// 2 MiB file
const uint32_t FILE_SIZE = 2 << 20;

// SD chip select pin.
#define SD_CS_PIN SS

MySpiClass mySpi;

#define SD_CONFIG SdSpiConfig(SD_CS_PIN, DEDICATED_SPI, SD_SCK_MHZ(50), &mySpi)
SdFs sd;
FsFile file;
uint8_t buf[512];
//------------------------------------------------------------------------------
void setup() {
  Serial.begin(9600);
  while (!Serial) {}
  Serial.println("Type any");
  while(!Serial.available()) {}

  if (!sd.begin(SD_CONFIG)) {
    sd.initErrorHalt(&Serial);
  }
  if (!file.open("Test.bin", O_RDWR|O_CREAT|O_TRUNC)) {
    sd.errorHalt("open");
  }
  if (!file.preAllocate(FILE_SIZE)) {
    sd.errorHalt("preAllocate");
  }
  uint32_t m = micros();
  for (int i = 0; i < 2048; i++) {
    if (512 != file.write(buf, 512)) {
      sd.errorHalt("write");
    }
  }
  m = micros() - m;
  file.truncate();
  Serial.print(file.size()/(0.001*m));
  Serial.println(" KB/sec");
  file.close();  
  sd.ls(&Serial, LS_SIZE);
}
//------------------------------------------------------------------------------
void loop() {}
#else  // SPI_DRIVER_SELECT
#error SPI_DRIVER_SELECT must be three in SdFat/SdFatConfig.h
#endif  // SPI_DRIVER_SELECT

Here is the driver wrapper Stm32SpiDriver.h

#ifndef Stm32SpiDriver_h
#define Stm32SpiDriver_h

#include "SdFat.h"
#include "SPI.h"  // Only required if you use features in the SPI library.

#if SPI_DRIVER_SELECT != 3  // Must be set in SdFat/SdFatConfig.h
#error SPI_DRIVER_SELECT must be three in SdFat/SdFatConfig.h
#endif  // SPI_DRIVER_SELECT

// This is a simple driver based on the the standard SPI.h library.
// You can write a driver entirely independent of SPI.h.
// It can be optimized for your board or a different SPI port can be used.
// The driver must be derived from SdSpiBaseClass.
// See: SdFat/src/SpiDriver/SdSpiBaseClass.h
class MySpiClass : public SdSpiBaseClass {
 public:
  // Activate SPI hardware with correct speed and mode.
  void activate() {
    SPI.beginTransaction(m_spiSettings);
  }
  // Initialize the SPI bus.
  void begin(SdSpiConfig config) {
    (void)config; 
    SPI.begin();
  }
  // Deactivate SPI hardware.
  void deactivate() {
    SPI.endTransaction();
  }
  // Receive a byte.
  uint8_t receive() {
    return SPI.transfer(0XFF);
  }
  // Receive multiple bytes.  
  // Replace this function if your board has multiple byte receive.
  uint8_t receive(uint8_t* buf, size_t count) {
    memset(buf, 0XFF, count);
    SPI.transfer((void*)buf, count);
    return 0;
  }
  // Send a byte.
  void send(uint8_t data) {
    SPI.transfer(data);
  }
  // Send multiple bytes.
  // Replace this function if your board has multiple byte send.
  void send(const uint8_t* buf, size_t count) {
    size_t n;
    if (count > 512) {
      return;
    }
    n = count;
    uint8_t tmp[n];
    SPI.transfer((void*)buf, (void*)tmp, count);
    return;
  }
  // Save SPISettings for new max SCK frequency
  void setSckSpeed(uint32_t maxSck) {
    m_spiSettings = SPISettings(maxSck, MSBFIRST, SPI_MODE0);
  }

 private:
  SPISettings m_spiSettings;
};
#endif  // Stm32SpiDriver_h

Here is the output:


Type any
1946.68 KB/sec
   1048576 Test.bin

You can use this by adding it to other sketches or put the .h file in a library folder named Stm32SpiDriver.

This is still slow but I just don't have time to fix drivers for the many board support packages. My fixes keep breaking when new versions are released.

Bambofy commented 3 years ago

I tried another idea for a better SPI driver. I was able to get 1946 KB/sec on a Nucleo F446RE.

Here are the two files. Put both files in a folder named UserStm32SPIDriver in the Arduino folder with other sketches.

Here is the test sketch UserStm32SPIDriver.ino

// An example of an external SPI driver.
//
#include "SdFat.h"
#include "Stm32SpiDriver.h"

#if SPI_DRIVER_SELECT == 3  // Must be set in SdFat/SdFatConfig.h

// 2 MiB file
const uint32_t FILE_SIZE = 2 << 20;

// SD chip select pin.
#define SD_CS_PIN SS

MySpiClass mySpi;

#define SD_CONFIG SdSpiConfig(SD_CS_PIN, DEDICATED_SPI, SD_SCK_MHZ(50), &mySpi)
SdFs sd;
FsFile file;
uint8_t buf[512];
//------------------------------------------------------------------------------
void setup() {
  Serial.begin(9600);
  while (!Serial) {}
  Serial.println("Type any");
  while(!Serial.available()) {}

  if (!sd.begin(SD_CONFIG)) {
    sd.initErrorHalt(&Serial);
  }
  if (!file.open("Test.bin", O_RDWR|O_CREAT|O_TRUNC)) {
    sd.errorHalt("open");
  }
  if (!file.preAllocate(FILE_SIZE)) {
    sd.errorHalt("preAllocate");
  }
  uint32_t m = micros();
  for (int i = 0; i < 2048; i++) {
    if (512 != file.write(buf, 512)) {
      sd.errorHalt("write");
    }
  }
  m = micros() - m;
  file.truncate();
  Serial.print(file.size()/(0.001*m));
  Serial.println(" KB/sec");
  file.close();  
  sd.ls(&Serial, LS_SIZE);
}
//------------------------------------------------------------------------------
void loop() {}
#else  // SPI_DRIVER_SELECT
#error SPI_DRIVER_SELECT must be three in SdFat/SdFatConfig.h
#endif  // SPI_DRIVER_SELECT

Here is the driver wrapper Stm32SpiDriver.h

#ifndef Stm32SpiDriver_h
#define Stm32SpiDriver_h

#include "SdFat.h"
#include "SPI.h"  // Only required if you use features in the SPI library.

#if SPI_DRIVER_SELECT != 3  // Must be set in SdFat/SdFatConfig.h
#error SPI_DRIVER_SELECT must be three in SdFat/SdFatConfig.h
#endif  // SPI_DRIVER_SELECT

// This is a simple driver based on the the standard SPI.h library.
// You can write a driver entirely independent of SPI.h.
// It can be optimized for your board or a different SPI port can be used.
// The driver must be derived from SdSpiBaseClass.
// See: SdFat/src/SpiDriver/SdSpiBaseClass.h
class MySpiClass : public SdSpiBaseClass {
 public:
  // Activate SPI hardware with correct speed and mode.
  void activate() {
    SPI.beginTransaction(m_spiSettings);
  }
  // Initialize the SPI bus.
  void begin(SdSpiConfig config) {
    (void)config; 
    SPI.begin();
  }
  // Deactivate SPI hardware.
  void deactivate() {
    SPI.endTransaction();
  }
  // Receive a byte.
  uint8_t receive() {
    return SPI.transfer(0XFF);
  }
  // Receive multiple bytes.  
  // Replace this function if your board has multiple byte receive.
  uint8_t receive(uint8_t* buf, size_t count) {
    memset(buf, 0XFF, count);
    SPI.transfer((void*)buf, count);
    return 0;
  }
  // Send a byte.
  void send(uint8_t data) {
    SPI.transfer(data);
  }
  // Send multiple bytes.
  // Replace this function if your board has multiple byte send.
  void send(const uint8_t* buf, size_t count) {
    size_t n;
    if (count > 512) {
      return;
    }
    n = count;
    uint8_t tmp[n];
    SPI.transfer((void*)buf, (void*)tmp, count);
    return;
  }
  // Save SPISettings for new max SCK frequency
  void setSckSpeed(uint32_t maxSck) {
    m_spiSettings = SPISettings(maxSck, MSBFIRST, SPI_MODE0);
  }

 private:
  SPISettings m_spiSettings;
};
#endif  // Stm32SpiDriver_h

Here is the output:


Type any
1946.68 KB/sec
   1048576 Test.bin

You can use this by adding it to other sketches or put the .h file in a library folder named Stm32SpiDriver.

This is still slow but I just don't have time to fix drivers for the many board support packages. My fixes keep breaking when new versions are released.

Ah this looks great! Problem is it isn't the only thing on the SPI!

greiman commented 3 years ago

You can use it as shared SPI but that will require large writes to get speed. A better solution is to use a second SPI port. Almost all STM32 boards have more than SPI one port.

Here is the result with DEDICATED_SPI replaced by SHARED_SPI and 512 byte writes.

Type any
418.75 KB/sec
   1048576 Test.bin

Here is 2048 byte writes.

Type any
1019.12 KB/sec
   1048576 Test.bin

Once again ST lags. They don't make it easy to use other ports. I have used ChibiOS/RT to log ADC data from STM32 boards at a million samples per second. There is no way to get the performance STM32 is capable of with the ST board package.

The Nucleo F446RE has three SPI ports. I can easily use these on MBED and ChibiOS.

greiman commented 3 years ago

Here is a 10 kHz Sine logged a 100 points per cycle or 1,000,000 samples per second with ChibiOS. I used the ChibiOS DMA ADC driver. I used a port of SdFat to log to a SD. I never posted the SdFat port for ChibiOS.

million

I have logged six pins at 200,000 samples per second for each pin. The STM32 ADC allow a very general setup with a sequence of up to 18 measurements per cycle.

greiman commented 3 years ago

Looks like it is possible to use other STM32 SPI ports with the ST Arduino board package. I will play with it and add an example so other users can customize the example for their STM32 processor.

greiman commented 3 years ago

It works to specify other SPI ports. I modified the .h file and used SPI port 3 on the F446RE Nucleo, the pins were handy. Port 3 has a slower max clock speed but I still get 1206 KB/sec with a 512 byte write using DEDICATED_SPI. Larger buffers don't help, the SPI clock and the ST driver is the limiting factor.

To define another SPI object you look at the processor reference manual to find suitable pins for one of the ports then do this:

#define SD_CS_PIN D8
#define MOSI3 PC12
#define MISO3 PC11
#define SCK3  PC10
SPIClass SD_SPI(MOSI3, MISO3, SCK3);
MySpiClass mySpi(&SD_SPI);

I didn't check the max speed for SPI port 2, I suspect it will also be slower than port 1.

I will post the edited files if you have any interest in using ports other than the standard SPI object.

Bambofy commented 3 years ago

wow that is awesome results, i don't quite fully understand how to configure the 2 SPI ports but i will definitely look into it. i'll need to reconfigure the USART things which is a bit complicated, what frequency and clock were you running the SPI port at? I couldn't get my SD card to work above 20MHz is that the limit for SPI transferring too?

greiman commented 3 years ago

All data rates are all for DEDICATED_SPI. The second and third port on STM32F446RE run at 22.5 MHz max. I get 1206 KB/sec on those ports. The first port runs at 45 MHz. I get 1946.68 KB/sec on port 1. This is common for STM32 the first port is on the fast bus and second and third run half as fast.

If the SPI drivers were DMA I would expect nearly 5MB/sec on the first port.

I tried the STM32SD library from ST that you pointed out. I could not use it on STM32F446RE, Even in the latest version, 19.0, of the board package is missing PinMap_SD for STM32F446RE and I didn't feel like trying to define it.

I uses a STM32F411RE and got 605 KB/sec for 512 byte writes. This is not bad for the ST SDMMC controller. It uses the FIFO in polled I/O to avoid the DMA alignment problem.

Bambofy commented 3 years ago

All data rates are all for DEDICATED_SPI. The second and third port on STM32F446RE run at 22.5 MHz max. I get 1206 KB/sec on those ports. The first port runs at 45 MHz. I get 1946.68 KB/sec on port 1. This is common for STM32 the first port is on the fast bus and second and third run half as fast.

If the SPI drivers were DMA I would expect nearly 5MB/sec on the first port.

I tried the STM32SD library from ST that you pointed out. I could not use it on STM32F446RE, Even in the latest version, 19.0, of the board package is missing PinMap_SD for STM32F446RE and I didn't feel like trying to define it.

I uses a STM32F411RE and got 605 KB/sec for 512 byte writes. This is not bad for the ST SDMMC controller. It uses the FIFO in polled I/O to avoid the DMA alignment problem.

Awesome, i wish i could get >50KB/s speeds!! Just out of interest, what SD cards do you use to test with?

greiman commented 3 years ago

With the ST SDMMC controlled I use a the highest end 32GB card that I have, a Samsung Pro+, this compensates for the ST controller. A lot of cards don't work. This is because the driver is polled and starts and stops the SD clock for flow control. It may do this up to 16 times for a 512 byte sector transfer. Since it does bursts of 32 bytes.

With dedicated SPI my driver can leave the card in read or write mode so most cards do well. The Samsung 32G Pro+ does 1945 KB/sec on the STM32F446RE SPI port 1, an old cheap Amazon Basic 4GB card that I bought in 2012 does 1925 -1940 KB/sec.

The Amazon card fails with the SDMMC example.

Bambofy commented 3 years ago

Hmm, i'm using a 32,768 bytes FIFO buffer, 16,384 cache buffer for writing to the sd card.

Samples are gathered at 16KHz and put into the fifo buffer, when there are enough samples its copied from the fifo buffer, into the cache and wrote to the sd card.

I can record 425,984 bytes over 10 seconds, but i need 480KB since 16KHz at 24bit depth for 10 seconds is 480KB.

The sd card is running at 8MHz because it wont write data at speeds above 8MHz, i think i must go back to the SPI driver?

greiman commented 3 years ago

I suspect the SDMMC controller/driver with STM32SD library will never work.

Which STM32 chip are you using? What device are you collecting samples from? How are you collecting the samples? are you using an interrupt routine? Is it on SPI? Are you using a library with the device?

Bambofy commented 3 years ago

I suspect the SDMMC controller/driver with STM32SD library will never work.

Which STM32 chip are you using? What device are you collecting samples from? How are you collecting the samples? are you using an interrupt routine? Is it on SPI? Are you using a library with the device?

STM32L452RE, collecting samples at 16KHz from an SAI audio device connected to the dedicated SAI port. Its collecting the samples via the ISR.

I think i've found out the problem, I can easily manage 2MB/s transfers on my clean sketch, so it must be something about my bigger project thats bottlenecking it. I think maybe the FIFO buffer is throttleing it somehow?

greiman commented 3 years ago

That should be really simple. I would use a ring buffer to write directly to an SD on one of the SPI ports. SdFat has a ring buffer in the new beta that can be used in an ISR.

Here is an example that logs from an ADC ISR at 6 MB/sec.

Here is the 400 sector ring buffer. You can use a much smaller ring buffer. It should be a multiple of 512 bytes

#include "RingBur.h"
const size_t RING_BUF_SIZE = 400*512;
// RingBuf for 512 byte sectors.
RingBuf<FsFile, RING_BUF_SIZE> rb;

Here is the ISR part. You can use any size transfer into the ring buffer. I used a 1024 byte pin-pong buffer with the ADC since I needed to collect 6MB/sec. My rate is about 12,000 interrupts per second.

//ISR.
static void isr() {
  if (rb.bytesFreeIsr() >= 512 && !overrun) {
    rb.memcpyIn(dmaBuf[dmaCount & 1], 512);
    dmaCount++;
  } else {
    overrun = true;
  }
}

Here is the SD writer. I write a single sector at a time since SdFat SPI takes advantage or the RAM in the SD card. I preallocate a 8GiB file. You wont need anything like that.

  while (!overrun && !Serial.available()) {
    size_t n = rb.bytesUsed();
    if ((n + file.curPosition()) >= (PRE_ALLOCATE_SIZE - 512)) {
      Serial.println("File full - stopping");
      break;
    }
    if (n >= 512) {
      if (rb.writeOut(512) != 512) {
        Serial.println("writeOut() failed");
        file.close();
        return;
      }
    }
  }

I have tested this example at slower rates with SPI SdFat. It should easily reach you rates.

Bambofy commented 3 years ago

That should be really simple. I would use a ring buffer to write directly to an SD on one of the SPI ports. SdFat has a ring buffer in the new beta that can be used in an ISR.

Here is an example that logs from an ADC ISR at 6 MB/sec.

Here is the 400 sector ring buffer. You can use a much smaller ring buffer. It should be a multiple of 512 bytes

#include "RingBur.h"
const size_t RING_BUF_SIZE = 400*512;
// RingBuf for 512 byte sectors.
RingBuf<FsFile, RING_BUF_SIZE> rb;

Here is the ISR part. You can use any size transfer into the ring buffer. I used a 1024 byte pin-pong buffer with the ADC since I needed to collect 6MB/sec. My rate is about 12,000 interrupts per second.

//ISR.
static void isr() {
  if (rb.bytesFreeIsr() >= 512 && !overrun) {
    rb.memcpyIn(dmaBuf[dmaCount & 1], 512);
    dmaCount++;
  } else {
    overrun = true;
  }
}

Here is the SD writer. I write a single sector at a time since SdFat SPI takes advantage or the RAM in the SD card. I preallocate a 8GiB file. You wont need anything like that.

  while (!overrun && !Serial.available()) {
    size_t n = rb.bytesUsed();
    if ((n + file.curPosition()) >= (PRE_ALLOCATE_SIZE - 512)) {
      Serial.println("File full - stopping");
      break;
    }
    if (n >= 512) {
      if (rb.writeOut(512) != 512) {
        Serial.println("writeOut() failed");
        file.close();
        return;
      }
    }
  }

I have tested this example at slower rates with SPI SdFat. It should easily reach you rates.

I solved it with a double buffer. Back buffer and a front buffer both sized at 32KB. The sampler writes to the front buffer, when full, the front buffer is copied to the back buffer and then to the sd card. The sampler always writes to the front buffer.

greiman commented 3 years ago

Sorry, I should have ask what you were doing earlier. It's such a simple problem so something had to be wrong.

I have been working with Paul Stoffregen on features in SdFat for his Audio Adapters and library. He has been recording 44.1kHz 16-bit stereo for a while. He sent me various hardware.

He is now interested in mixing four or more recordings on the fly. This is a hard problem - streaming and mixing more than four stereo recordings really stresses a SD.

Now users are asking for 24bit @48kHz(or 96kHz) since CD quality audio is dead for professional audio projects.

Bambofy commented 3 years ago

Sorry, I should have ask what you were doing earlier. It's such a simple problem so something had to be wrong.

I have been working with Paul Stoffregen on features in SdFat for his Audio Adapters and library. He has been recording 44.1kHz 16-bit stereo for a while. He sent me various hardware.

He is now interested in mixing four or more recordings on the fly. This is a hard problem - streaming and mixing more than four stereo recordings really stresses a SD.

Now users are asking for 24bit @48kHz(or 96kHz) since CD quality audio is dead for professional audio projects.

Yes audio data really is very data intensive! I've not tried reading at high speeds, but according to here file:///C:/Users/Richa/AppData/Local/Temp/dm00525510-getting-started-with-stm32h7-series-sdmmc-host-controller-stmicroelectronics.pdf if you pay for a really expensive SD and a proper reader you should be able to get some ridiculous speeds edit: re-wrote the doublebuffer into a ringbuffer, hopefully i can some steadier rates :)

greiman commented 3 years ago

Recording audio at even 16-bit 44.1kHz stereo is not data intensive it's under 200 kB/sec, It's been done for years. People have done your rates with an Uno.

What is data intensive is mixing five or six stream of audio and running the streams through digital filters.

Click on this to see state of the art audio for a micro-controller.

Check out this: Receive 8 channel audio from three I2S devices, using I2S master mode. This is CD quality 16-bit 44.1kHz 8-channel.