raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.02k stars 4.95k forks source link

Two MicroSDs have become read-only / died after being rsync'd the booted installation/root #4190

Open tomty89 opened 3 years ago

tomty89 commented 3 years ago

I know it probably sounds like the cards just happen to have reached EOL, but something doesn't smell right.

Both cards are from SanDisk, and they can hardly be counterfeit (sold from trusted local retailer, believe it or not). One of them is an 128GB Extreme (recent "mobile gaming variant"), which was purchased very recently and was written once or twice some big files that covered like 2/3 of the card. The other one was an Ultra (120MB/s variant manufactured last year) purchased today.

Both of them have become read-only during the process that I try to rsync the booted installation/root (from another card that are inserted to a USB3 microSD reader, plugged into a USB3 port, if it could matter) to them. I created an f2fs on a LV on them for that. They died consecutively a couple hours ago as I tried to repeat the process.

The installation is ~5G in total. The rsync and umount seemingly finished without a problem. It became known that they died when I try to write to another partition (fat32) / LV (f2fs), as it starts to give I/O error. (But no more error after a reboot or so, just that the writing is like silently ignored.)

It seem to have something to do with a large/prolonged batch writing or so, as the "source" card didn't die (well, at least doesn't seem to have died yet) while being used in the host/slot for daily purposes (some web browsing, package upgrades...).

I tried to fix them with https://github.com/BertoldVdb/sdtool, but the CSD writing is silently ignored as well, and the program does not report them as being permanently locked or so either (assuming it can). (But the program worked when I accidentally locked the "source" card on the same Pi 4, and then I successfully unlocked it on a Pi 3B+.)

I know you probably can't help investigating the problem with what I've provided. Neither am I sure that it is caused by bad kernel code, bad eeprom updates or my pi's slot or so have simply gone bad (or I just happen to be really unlucky today), but I feel like this should be kept open for a while so if others notice something as well, we might be able to confirm something is indeed wrong sooner.

It's a Pi 4B 2GB (some older revision I think), Arch Linux ARM with kernel 5.10.17-3 at the moment. EEPROM was updated to Fri Dec 11 11:15:17 AM UTC 2020 (1607685317).

pelwell commented 3 years ago

If you do have another of those SD cards it's worth running a capacity checker on it - something like H2testw (windows), Capacity Tester (Linux), etc.

HinTak commented 3 years ago

The 'died' description is a bit vague. Before you go lower level to anything sd card related, what does the file system repair tools (fsck.vfat etc) reports, if you try to hook them up to a functional linux system as external/additional disks?

A few of the linux file systems (fat32 in particular) is known to fail under extreme stress, but that's not sd-card related just in general; and that coping with linux's generic agressive disk caching - if you have a system with a lot of memory, file system writes are not actually written to the disk until an fsync is issued - that can cause failure, especially on systems with a lot of RAM memory. Are you able to replicate the same problem if either source or destination is not fat32 ?

tomty89 commented 3 years ago

No, it's not. I know if enough test with dd / hexdump / blkdiscard. It was not a filesystem level problem. As I said, the card became read-only in the way that (block-level) writing and erasing more or less became NOP.

I couldn't reproduce the problem with another new card from Samsung, but before I start with it I upgraded the EEPROM and kernel and so on. I wonder if it has something to do with the voltage regression mentioned in git log of the EEPROM repo.

timg236 commented 3 years ago

The EEPROM change was just about when the 1.8V to 3.3V voltage switch occurs so it won't make any difference here.