raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.18k stars 1.68k forks source link

rpi-update to 4.14.48 partially killed SD card #1006

Open Elijahg opened 6 years ago

Elijahg commented 6 years ago

After updating to 4.14.48 and rebooting, my Pi was unable to boot. Multiple attempts to image the SD card with dd on a Mac resulted in I/O errors at similar byte counts. I low-level formatted the card with the utility from sdcard.org which seemed to fix it. A second attempt to bump to 4.14.48 killed the card again. It's a Sandisk Extreme 32gb card. I've attempted to image the card with the built-in SD card reader on two different Macs, both get I/O errors. Strangely, an ancient USB 2.0 reader doesn't seem to error.

Steps to reproduce: 1) Update to 4.14.48 with rpi-update 2) Reboot - will result in no boot 3) Attempt to copy data from card with dd - IO errors experienced.

MichaIng commented 6 years ago

I was experiencing the same. Error occurs on really boot stage, where kernel is unable to mount root device.

Outstanding tasks is perhaps to verify issue on proven flawless SD and fresh Raspbian (Lite) image.

Elijahg commented 6 years ago

Not sure if the kernel has loaded a HDMI driver by the time it fails, but I get no output on a TV. The green SD activity light doesn’t illuminate at all either.

pelwell commented 6 years ago

Which models of Pi do you both have?

Elijahg commented 6 years ago

Mine’s a model 3B.

pelwell commented 6 years ago

Do you (both) know what was the most recent firmware that worked for you?

pelwell commented 6 years ago

[ I'm not asking you (yet) to go back and try them all, I just wondered if you happened to know. ]

Elijahg commented 6 years ago

Good question, the one in the release version of DietPi was fine. Pretty sure that uses the latest stable version of Raspbian, so that'd be commit 5db8e4e1c63178e200d6fbea23ed4a9bf4656658 presumably with kernel version 4.9.80 see below. I'll check as soon as I've reinstalled again.

MichaIng commented 6 years ago

@pelwell I am using RPi2, I just checked backup of last working system with kernel 4.14.39 on it. Not 100% sure if it was the latest, but seems to be.

Elijahg commented 6 years ago

Looks like it was likely on 4.14.34 before the update.

pelwell commented 6 years ago

Thanks - that narrows it down a bit.

Fourdee commented 6 years ago

REF: I was unable to replicate any issues with rpi-update and RPi 3 B+ on DietPi:

MichaIng commented 6 years ago

Indeed as SDcards break fast, it is definitely possible, that mine and TOs SDcard is the issue, not the firmware update. In my case it was not the first SD failure, although the last bad block test run did not find any. Just for completeness, my SD: Kingston SDCA3/64GB ­SDXC UHS-­I U3 @pelwell Do you have one left to test it again? Sadly my Pi and SD is in production again, on next failure/possibility I will replace it and retest.

pelwell commented 6 years ago

I found my old 16GB Kingston card that used to show problems with small ERASE commands, and installing the 2018-04-08 Raspbian and rpi-updateing to 4.14.50 works fine for me (although the card does seem quite sluggish).

You can tell if your card has been detected as one of the bad ones because dmesg | grep 'mmcblk0: mmc' will show something like:

[    0.994628] mmcblk0: mmc0:003 SD16G 14.4 GiB (quirks 0x80000000)

(note the quirks).

If you want to enable the same workaround for any random card, add the following to cmdline.txt:

mmcblk.card_quirks=0x80000000
MichaIng commented 6 years ago

@pelwell

2018-06-20 23:12:39 root@micha:/var/log# dmesg | grep 'mmcblk0: mmc'
[    0.873869] mmcblk0: mmc0:0007 SD64G 59.8 GiB (quirks 0x80000000)

Can you tell us, what the show up of this means, respectively what you mean by detected as one of the bad ones? Does mmcblk.card_quirks=0x80000000 then solves issues, respectively reduces the risk that the card becomes corrupted?

Found https://github.com/raspberrypi/firmware/issues/601 but could not really find out about the actual issue or how to solve it if actually possible 😉.

pelwell commented 6 years ago

Unless you have included mmcblk.card_quirks in cmdline.txt then the presence of the quirks number means the SD subsystem thinks you card matches one of the list of blocks that doesn't implement ERASE correctly for small areas. The quirk (one I added as a best guess) disables ERASEs for that card in order to work around the bug.

If your card is already reporting the quirk without you manually setting it then setting it manually will have no effect, except perhaps to remove other quirks that may already have been set.

If your card is being detected as a bad one and yet the problem persists then either there is a new problem or the workaround isn't working as I expected.

MichaIng commented 6 years ago

@pelwell Okay, so at least in my case the ERASE quirk is not the issue. But as I said, until there are not many others having SDcard issues just after current rpi-update and there was nothing obvious changed in how SDcard/read/write is handled, I would not invest too much time for deep investigation. I already had SDcard issues in the past and yey they simply break fast 😉.

Elijahg commented 6 years ago

I was in the process of imaging my SD card on my Mac to try rpi-update again, but when doing so I started getting read errors. So it does look like in my case at least, the card is dying. It's a 32GB Sandisk Extreme, about a year old. Contacted Sandisk today who are replacing it with a newer A1 rated card.

Interestingly erasing it with the SD Association's formatter seems to fix it for a while, only for it to break again a week or so later. Again not sure if this is a Pi thing or a SD thing, since I repeatedly filled the card with data totalling several hundred GB, and didn't have a read or write fail once. But after being in the fairly idle Pi for a few days, it dies.

JamesH65 commented 5 years ago

@Elijahg Did the new card work OK? Can this issue be closed?

Elijahg commented 5 years ago

I've not had a problem with the A1 card so far, so it may well have been the card - or at least the model of the card I had originally. I'm also on 4.14.77 now, which has been perfectly stable, so I think it may well have been the card.