greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.05k stars 497 forks source link

Achieving high write speeds #434

Open besmarsh opened 1 year ago

besmarsh commented 1 year ago

Hi, thanks for your great work on this library! I'm using this library as it's much faster than the stock SD Arduino library, and I'm trying to achieve high write speeds.

I'm running your bench example to test write speeds and comparing to the speeds you reported in the main README. Using shared SPI mode I get a write speed of around 190 KB/sec. Using dedicated SPI mode I get a write speed of around 470 KB/sec and an average latency of around 1000 usec, compared to your reported nearly 4000 KB/sec and 127 usec average latency using dedicated SPI.

Test details:

Do you have any insight as to why the speeds I'm seeing are 10x slower than those you have achieved? Thanks in advance!

greiman commented 1 year ago

Edit: The SPI driver on the MKR Zero only uses a max clock of 12 MHz and the driver sends/receives a byte at a time with huge gaps between bytes.

The MKR Zero is a dog. There is no hope with the Arduino SPI Driver.

Here is what I get with a Samsung 32 GB microSD:

FreeStack: 27744 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 468.78,1111,1087,1089 468.74,1112,1087,1089

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 464.81,1108,1096,1098 464.86,1107,1096,1098

Same card on a Due:

FreeStack: 92552 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 4528.99,128,110,111 4533.09,127,110,111

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 4528.99,114,110,111 4541.33,113,110,110

Teensy 4.1 SDIO same card:

FreeStack: 450016 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 22321.43,53,22,22 22222.22,52,22,22

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 22421.53,1088,22,22 22624.43,133,22,22

greiman commented 1 year ago

Even an Uno is faster:

FreeStack: 560 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 689.27,764,728,736 689.18,760,728,736

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 663.75,776,760,765 663.83,776,760,765

greiman commented 1 year ago

I checked MRK Zero SPI again. I ran this program:

#include <SPI.h>
uint8_t buf[512];
void setup() {
  Serial.begin(9600);
  SPI.begin();
  SPI.beginTransaction(SPISettings(50000000, MSBFIRST, SPI_MODE0));
  uint32_t m = micros();
  for (int i = 0; i < 512; i++) {
    SPI.transfer(buf[i]);
  }
  m = micros() - m;
  Serial.println(m);
}
void loop() {}

It prints 1077. So you will never reach 500 KB/sec.

Here is a capture of SPI SCK. I didn't record other signals. Note the big gaps between bytes. transfer(buf, count) is no faster.

SPI

greiman commented 1 year ago

One last point about SAMD21 boards. Arduino could improve the SPI driver by implementing DMA. It's sad since many people use SAMD21 boards.

Adafruit has a DMA driver for SAMD21. It has some problems but is about 2.5 times faster. Here is the result for a Feather M0 Express at 12 MHz:

FreeStack: 26960 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1117.07,477,455,456 1117.07,474,454,456

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1111.61,463,457,457 1111.85,463,457,457

Same driver on Feather Express M4 SAMD51

FreeStack: 190036 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 2791.74,199,182,182 2790.18,198,182,182

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 2813.73,183,181,181 2812.15,183,181,181

besmarsh commented 1 year ago

Thanks for all the info and test results!

I guess there's a limit to how fast it will go on the MKR Zero then with the 12MHz SPI clock limit.

I captured my SPI SCK and observed the same thing with the big gaps between the bytes.

Have you seen this thread? https://forum.arduino.cc/t/faster-spi-on-the-zero/345296/ The user scswift tests out an optimised block transfer function. The gaps between bytes that we have observed are so big even using transfer(buf, count) because the SPI driver does not have an optimised block transfer but just loops through count transferring a byte at a time. I did some quick tests using the optimisations from scswift in this thread, and it makes a reasonable difference.

If this was implemented in the SPI driver, much higher speed could be achieved on the MKR Zero even without using DMA.

greiman commented 1 year ago

I no longer can support custom hacks to system drivers. There are about 100 Arduino like boards with many board support packages.

Yes I have seen that hack but it can't be put in the official Arduino driver since many apps depend on full duplex send/receive.

I started putting custom hacked drivers in SdFat. I am now removing them when they fail due to system mods. Hacks also conflict with other SPI use.

The answer is in many board support packages and I support the common new APIs like this:

/**
 * If USE_SPI_ARRAY_TRANSFER is one and the standard SPI library is
 * use, the array transfer function, transfer(buf, count), will be used.
 * This option will allocate a 512 byte temporary buffer for send.
 * This may be faster for some boards.  Do not use this with AVR boards.
 *
 * Warning: the next options are often fastest but only available for some
 * non-Arduino board packages.
 *
 * If USE_SPI_ARRAY_TRANSFER is two use transfer(nullptr, buf, count) for
 * receive and transfer(buf, nullptr, count) for send.
 *
 * If USE_SPI_ARRAY_TRANSFER is three use transfer(nullptr, buf, count) for
 * receive and transfer(buf, rxTmp, count) for send. Try this with Adafruit
 * SAMD51.
 *
 * If USE_SPI_ARRAY_TRANSFER is four use transfer(txTmp, buf, count) for
 * receive and transfer(buf, rxTmp, count) for send. Try this with STM32.
 */
#ifndef USE_SPI_ARRAY_TRANSFER
#define USE_SPI_ARRAY_TRANSFER 0
#endif  // USE_SPI_ARRAY_TRANSFER

I used USE_SPI_ARRAY_TRANSFER 3 On the Adafruit board.

Option one is not faster for SAMD21 on Arduino since it is not optimized.

Edit: on many boards option 1 uses DMA and is very fast. Arduino could do this and get most of the 1117 KB/sec that AdaFruit has.

Once again, it is too bad Arduino has such poor support for SPI.

greiman commented 1 year ago

Have you seen this board. It has many of the features of the MKR Zero.

besmarsh commented 1 year ago

I don't see why the optimised block transfer function couldn't be put in the official Arduino driver, even the one that conforms to the existing behaviour offers some speedup. If it was, SdFat could use it using USE_SPI_ARRAY_TRANSFER 1 without having to implement any custom hacked driver.

Of course DMA would be even better, but it's a bigger change to the existing driver.

greiman commented 1 year ago

If you follow up on use of that optimization you will see it no longer compiles. the SAMD SERCOM class no longer has the same functions.

I think the optimization revealed a problem with the underlying SERCOM class so the SPI read and write were removed and SAMD21 SPI now has these funtions:

In SPI.cpp

void SPIClassSAMD::transfer(void *buf, size_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);
  for (size_t i=0; i<count; i++) {
    *buffer = transfer(*buffer);
    buffer++;
  }
}

byte SPIClassSAMD::transfer(uint8_t data)
{
  return _p_sercom->transferDataSPI(data);
}

In SERCOM.cpp

uint8_t SERCOM::transferDataSPI(uint8_t data)
{
  sercom->SPI.DATA.bit.DATA = data; // Writing data into Data register

  while( sercom->SPI.INTFLAG.bit.RXC == 0 )
  {
    // Waiting Complete Reception
  }

  return sercom->SPI.DATA.bit.DATA;  // Reading data
}

Often optimizations exploit bugs in the underlying low level drivers. As I said it is not worth hacking intermediate code. Fix the low level hardware driver.

greiman commented 1 year ago

I tried a slight optimization using the current SERCOM driver.

I get this result:

FreeStack: 28008 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 612.75,854,827,833 612.67,853,825,833

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 583.98,877,866,873 584.04,877,866,873

So write is 612 KB/sec vs 468 KB/sec, a 31% improvement.

The mod is this addition: C:\Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\cores\arduino\api\HardwareSPI.h virtual void transfer(const void *tx, void *rx, size_t count) = 0; C:\Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\libraries\SPI\SPI.h void transfer(const void *txBuf, void *rxBuf, size_t count); C:\Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\libraries\SPI\SPI.cpp

void SPIClassSAMD::transfer(const void *txBuf, void *rxBuf, size_t count) {
   const uint8_t *tx = reinterpret_cast<const uint8_t *>(txBuf);
   uint8_t *rx = reinterpret_cast<uint8_t *>(rxBuf);   
  if (tx && rx) {
    for (size_t i = 0; i < count; i++) {
      rx[i] =  _p_sercom->transferDataSPI(tx[i]);
    }
  } else if(tx) {
     for (size_t i = 0; i < count; i++) {
      _p_sercom->transferDataSPI(tx[i]);
    }
  } else {
    for (size_t i = 0; i < count; i++) {
      rx[i] =  _p_sercom->transferDataSPI(0XFF);
    }
  }    
}

I don;t see much more. Here is the low level hardware routine:

uint8_t SERCOM::transferDataSPI(uint8_t data)
{
  sercom->SPI.DATA.bit.DATA = data; // Writing data into Data register

  while( sercom->SPI.INTFLAG.bit.RXC == 0 )
  {
    // Waiting Complete Reception
  }

  return sercom->SPI.DATA.bit.DATA;  // Reading data
}

You can't skip the wait completion of reception or you will overrun send. The compiler already eliminates code for return on write.

Edit: Maybe making transferDataSPI() inline.

greiman commented 1 year ago

Got a bit more with inline.

FreeStack: 28008 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 665.78,786,759,766 665.69,786,759,766

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 666.93,768,758,764 667.02,769,758,764

Move this to the SERCOM.h file

    [[gnu::always_inline]] inline uint8_t transferDataSPI(uint8_t data)
    {
      sercom->SPI.DATA.bit.DATA = data; // Writing data into Data register

      while( sercom->SPI.INTFLAG.bit.RXC == 0 )
      {
        // Waiting Complete Reception
      }

      return sercom->SPI.DATA.bit.DATA;  // Reading data
    }
besmarsh commented 1 year ago

I tried two optimisations:

1) AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\cores\arduino\SERCOM.cpp Add:

void SERCOM::transferDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

  sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.

  while(--count > 0) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; // Read received byte, then increment pointer into buffer.
     sercom->SPI.DATA.bit.DATA = *buffer; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  *buffer = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte.
}

AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\cores\arduino\SERCOM.h Add: void transferDataSPI(void *buf, uint32_t count) ;

AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\libraries\SPI\SPI.cpp Change:

void SPIClassSAMD::transfer(void *buf, size_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);
  for (size_t i=0; i<count; i++) {
    *buffer = transfer(*buffer);
    buffer++;
  }
}

to:

void SPIClassSAMD::transfer(void *buf, size_t count)
{
  _p_sercom->transferDataSPI(buf, count);
}

Use USE_SPI_ARRAY_TRANSFER 1

Results:

FreeStack: 27744 Type is FAT32 Card size: 31.91 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3 OEM ID: SD Product: SD32G Revision: 8.5 Serial number: 0X5DDAB1C1 Manufacturing date: 11/2022

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 628.22,1945,805,812 628.22,3054,805,812

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 649.86,788,778,785 649.77,789,778,785

--> ~26% speedup

2) AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\cores\arduino\SERCOM.cpp Add:

void SERCOM::sendDataSPI(const void *buf, uint32_t count)
{
  const uint8_t *buffer = reinterpret_cast<const uint8_t *>(buf);

  sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.

  while(--count > 0) {
     while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
     sercom->SPI.DATA.bit.DATA = *buffer++; // Initiate byte transfer.
  }

  while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
}

void SERCOM::receiveDataSPI(void *buf, uint32_t count)
{
  uint8_t *buffer = reinterpret_cast<uint8_t *>(buf);

  sercom->SPI.DATA.bit.DATA = 0xFF; // Initiate byte transfer.

  while(--count > 0) {
    while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
    *buffer++ = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte, then increment pointer into buffer.
    sercom->SPI.DATA.bit.DATA = 0xFF; // Initiate byte transfer.
  }

   while(sercom->SPI.INTFLAG.bit.RXC == 0); // Wait for data to be available in the receive buffer.
  *buffer = sercom->SPI.DATA.bit.DATA & 0xFF; ; // Read received byte.
}

AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\cores\arduino\SERCOM.h Add:

void sendDataSPI(const void *buf, uint32_t count) ;
void receiveDataSPI(void *buf, uint32_t count) ;

AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13\libraries\SPI\SPI.cpp Add:

void SPIClassSAMD::transfer(const void *txBuf, void *rxBuf, size_t count) {
   const uint8_t *tx = reinterpret_cast<const uint8_t *>(txBuf);
   uint8_t *rx = reinterpret_cast<uint8_t *>(rxBuf);   
  if (tx && rx) {
    memcpy(rxBuf, txBuf, count);
    _p_sercom->transferDataSPI(rxBuf, count);
  } else if(tx) {
    _p_sercom->sendDataSPI(txBuf, count);
  } else {
    _p_sercom->receiveDataSPI(rxBuf, count);
  }
}

Use USE_SPI_ARRAY_TRANSFER 2

Results:

FreeStack: 27744 Type is FAT32 Card size: 31.91 GB (GB = 1E9 bytes)

Manufacturer ID: 0X3 OEM ID: SD Product: SD32G Revision: 8.5 Serial number: 0X5DDAB1C1 Manufacturing date: 11/2022

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 824.54,2226,613,618 820.34,18714,613,621

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 752.67,682,671,677 752.67,682,671,677

--> ~65% speedup

It'd be nice to see more optimised functions make their way into the official Arduino SPI driver for SAMD. AVR has an optimised block transfer function. The optimised block transfer function (optimisation 1) makes a decent difference, and the independent send and receive functions (optimisation 2) even more so. Neither should break compatibility either - the first optimises existing functionality and the second provides a new option for faster send/receive only.

greiman commented 1 year ago

I also tried similar optimizations. Next I looked at the datasheet for SAMD21 SPI. It has lots of buffering that is not used here.

The SPI uses the SERCOM transmitter and receiver configured as shown in 27.3 Block Diagram. Each side, master and slave, depicts a separate SPI containing a shift register, a transmit buffer and a two-level receive buffer.

SamdSpi That's why interrupt driven and DMA are so fast. Transmit is single buffed and receive is double buffered.

DMA and ISR schemes are triggered by two flags. We only use RXC so the buffering is not really used.

RXC - Receive Complete: This flag is cleared by reading the Data (DATA) register or by disabling the receiver. This flag is set when there are unread data in the receive buffer.

DRE - Data Register Empty: This flag is cleared by writing new data to DATA. This flag is set when DATA is empty and ready for new data to transmit.

I can't make the DRE flag work. I overwrite the Tx DATA register because the flag seems to comes on too soon.

greiman commented 1 year ago

I deleted all the failed or poor attempts to optimizetransfer(const void* txBuf, void* rxBuf, size_t count).

Here is what I hope is the final attempt.

Add the these lines to the following files.

Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13/libraries/SPI/SPI.h Add this line to the SPIClassSAMD class.

void transfer(const void *txBuf, void *rxBuf, size_t count);

Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13/libraries/SPI/SPI.cpp Add this member function.

void SPIClassSAMD::transfer(const void *txBuf, void *rxBuf, size_t count) {
  _p_sercom->transferDataSPI(txBuf, rxBuf, count);
}

Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13/cores/arduino/api/HardwareSPI.h Add this line to the HardwareSPI class.

virtual void transfer(const void *tx, void *rx, size_t count) = 0;

Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13/cores/arduino/SERCOM.h Add this line to the include section

#include <stddef.h>

Add this line to the SERCOM class.

void transferDataSPI(const void* txBuf, void* rxBuf, size_t count);

Users\Bill\AppData\Local\Arduino15\packages\arduino\hardware\samd\1.8.13/cores/arduino/SERCOM.cpp Add this member function.

void SERCOM::transferDataSPI(const void *txBuf, void *rxBuf, size_t count) {
  const uint8_t *tx = reinterpret_cast<const uint8_t *>(txBuf);
  uint8_t *rx = reinterpret_cast<uint8_t *>(rxBuf);
  size_t ir = 0;
  size_t it = 0;
  if (rx) {
    while (it < 2 && it < count) {
      if (sercom->SPI.INTFLAG.bit.DRE) {
        sercom->SPI.DATA.reg = tx ? tx[it] : 0XFF;
        it++;
      }
    }
    while (it < count) {
      if (sercom->SPI.INTFLAG.bit.RXC) {
        rx[ir++] = sercom->SPI.DATA.reg;
        sercom->SPI.DATA.reg = tx ? tx[it] : 0XFF;
        it++;
      }
    }
    while (ir < count) {
      if (sercom->SPI.INTFLAG.bit.RXC) {
        rx[ir++] = sercom->SPI.DATA.reg;
      }
    }
  } else if (tx && count) {  // might hang if count == 0
    // Writing '0' to this bit will disable the SPI receiver immediately.
    // The receive buffer will be flushed, data from ongoing receptions
    // will be lost and STATUS.BUFOVF will be cleared.
    sercom->SPI.CTRLB.bit.RXEN = 0;
    while (it < count) {
      if (sercom->SPI.INTFLAG.bit.DRE) {
        sercom->SPI.DATA.reg = tx[it++];
      }
    }
    // wait till all data sent
    while (sercom->SPI.INTFLAG.bit.TXC == 0) {
    }
    // Writing '1' to CTRLB.RXEN when the SPI is enabled will set
    // SYNCBUSY.CTRLB, which will remain set until the receiver is
    // enabled, and CTRLB.RXEN will read back as '1'.
    sercom->SPI.CTRLB.bit.RXEN = 1;
    while (sercom->SPI.CTRLB.bit.RXEN == 0) {
    }
  } 
}

Here is bench for: #define USE_SPI_ARRAY_TRANSFER 2

FreeStack: 28008 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes)

Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015

FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait.

write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1359.80,17535,368,374 1366.49,393,368,372

Starting read test, please wait.

read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1374.38,376,366,369 1374.38,376,366,369

greiman commented 1 year ago

I posted a Discussion on the Arduino site here.

I suspect I won't get any results. I looked at mods to the Arduino SAMD SPI driver and no recent pull requests have even been commented on.

I can't imagine Arduino changing the SPI API.

besmarsh commented 1 year ago

Wow, nice work. Great to see how much faster it can be when implemented properly. I hope Arduino take notice and incorporate your work.

I wonder if it's worth opening a PR, but I wouldn't ask you to spend more time on this if you don't think it will achieve anything.

greiman commented 1 year ago

I have not had anything accepted at Arduino or Adafruit about problems with SPI drivers for two years. There are lots of outstanding issues both places. So I have given up with just one post.

Sometimes people follow up with the same issue and still nothing happens.

I did get STMicroelectronics to fix a STM32 SPI driver that was unusable. it took two years and the fixed driver was very slow but you could access a SD.

cdd0042 commented 2 months ago

One last point about SAMD21 boards. Arduino could improve the SPI driver by implementing DMA. It's sad since many people use SAMD21 boards.

Adafruit has a DMA driver for SAMD21. It has some problems but is about 2.5 times faster. Here is the result for a Feather M0 Express at 12 MHz:

FreeStack: 26960 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes) Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015 FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait. write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1117.07,477,455,456 1117.07,474,454,456 Starting read test, please wait. read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 1111.61,463,457,457 1111.85,463,457,457

Same driver on Feather Express M4 SAMD51

FreeStack: 190036 Type is FAT32 Card size: 32.01 GB (GB = 1E9 bytes) Manufacturer ID: 0X1B OEM ID: SM Product: 00000 Revision: 1.0 Serial number: 0XE30F5501 Manufacturing date: 10/2015 FILE_SIZE_MB = 5 BUF_SIZE = 512 bytes Starting write test, please wait. write speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 2791.74,199,182,182 2790.18,198,182,182 Starting read test, please wait. read speed and latency speed,max,min,avg KB/Sec,usec,usec,usec 2813.73,183,181,181 2812.15,183,181,181

I know this is a little different from the main topic of this thread, but I was wondering how the drivers were implemented to get this benchmark test? I have a very similar setup (Feather M4) connected to a peripheral SD card and am trying to achieve similar write speeds to those achieved in this test.