greiman / SdFat

Arduino FAT16/FAT32 exFAT Library
MIT License
1.08k stars 504 forks source link

Question: SdFat and Power Failure #392

Open RadekPudelko opened 2 years ago

RadekPudelko commented 2 years ago

Is there a recommended way of using sdfat when power may be disconnected at any point during operation? I don't expect to fully recover from a power failure considering that files are being corrupted, but is there a way to reduce the corruption probability or detect when a corruption occurs? For instance, using Fat32 formatting on Particle io's devices and SPI, I found that after corruption, I either could not open a file or failed to delete it, if it was corrupted. I currently plan on using this as my check so I know not to interact with the corrupted file anymore, since that usually does more harm than good.

Also, when I tried ExFat format, file corruption behaved differently, with the contents of the files being replaced with the contents of another file + garbage, so I was wondering if changing the file format may be advantageous under my scenario?

greiman commented 2 years ago

The best way to reduce the risk of corruption is to call file.sync() at key points in the program. This is the same as file.close() but the file is not marked closed.

FAT16/FAT32 and exFAT are all about equally susceptible to corruption if files are not closed.

So if you call file.sync() after a sequence of writes, corruption can only happen if power fails during that time.

noisymime commented 2 years ago

The best way to reduce the risk of corruption is to call file.sync() at key points in the program.

Are calls to file.sync() blocking or slow such that these calls need to be handled carefully? All the examples only do the sync() call once logging is completed, but I'm trying to log in a case where power might be cut and graceful completion of the log isn't always possible.

I can, for example, add a file.sync() call every time the ring buffer is written, but if that is likely to cause blocking or performance issues, then I can call it on something like a 1s interval instead.

greiman commented 2 years ago

sync() does a lot of I/O. It flushes the data cache, reads the files directory entry, updates the directory entry and writes it to the SD. sync() causes a huge amount of I/O in the SD since flash pages in the SD are huge compared to the emulated 512 byte sectors so SD buffers are flushed also.

You should measure the time in your app using micros();

  uint32_t m = micros()
  file.sync();
  m = micros() - m;  // time for sync()
RadekPudelko commented 2 years ago

I've done some additional experiments and found that if I setup a static file and only make changes to the data in the file without ever changing its size, it does not corrupt in the event of power failure, although there may be partially written data, which I can easily recover from. I think this strategy could work for me, but I have one concern and that is with wear leveling. Am I correct to assume that writing data into a static file does not call the wear leveling algorithm? I think if it did, I would have seen a corruption, but I could be wrong because I am not sure exactly how and when wear leveling runs. I am not so concerned about using the sectors because the file would be used like a circular buffer, so wear would be distributed.

greiman commented 2 years ago

Don't worry about wear leveling. SD cards keep a database of write counts and map logical to physical address. Data is copied to new areas to even wear. That's why a write occasionally take a long time.

RadekPudelko commented 2 years ago

I am not too worried about the actual wear onto the sd card, but more so about what happens when the sd decides to move data elsewhere. For instance, if I preallocate a file and never and expand it, only writing within its size, I have so far been safe when powering off. I believe this is the case b/c I am writing to the same sectors over and over again. (I could be wrong about this, I don't really know what's going on under the hood). If however the sd card decides that I have used up these sectors too much and it tries to copy my data elsewhere and I lose power, I am not sure what happens. I am worried that file itself gets corrupted in this case, although the stars would really have to align for this to occur. I guess what I am asking is what is going on under the hood when I write within a file over and over again? Am I keeping the same clusters until the sd card decides to move the data over to somewhere less used up?

greiman commented 2 years ago

You can't control where your data is stored in the physical flash pages in the SD. Modern consumer cards use TLC NAND, to store three bits per cell as eight charge levels. Each cell has a life of at most 3,000 erase write cycles - even less in low cost consumer cards.

https://www.kingston.com/en/blog/pc-performance/difference-between-slc-mlc-tlc-3d-nand

These cards frequently remap logical address to new physical flash with a data copy. It is power fail tolerant since the map is changed in a fault tolerant way.

So you can't control or influence what happens in your card. You can write about 1,000 times the size of the card and it doesn't matter where you write.

RadekPudelko commented 2 years ago

Thank you, I think I can build a robust system with this info.