raspberrypi / firmware

This repository contains pre-compiled binaries of the current Raspberry Pi kernel and modules, userspace libraries, and bootloader/GPU firmware.
5.15k stars 1.68k forks source link

Rpi2 stops booting from swissbit #838

Open AndreasTBT opened 7 years ago

AndreasTBT commented 7 years ago

As posted on the boards here: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=188003

We are using industrial swissbit S-45µ EXT 4GB cards (because consumer grade cards will get corrupt on power loss eventually).

After some wear, the "original" and "new" RPi2 stops booting from it until it is inserted into any other device (PC, Android, ...), even though the data on it is perfectly fine (no corruption).

Not booting = no video, ACT led does not change for ~30sec. (At least sometimes) the ACT LED starts flashing 4-times-pattern after those 30sec.

Sometimes we have to write any file to "fix" it, but most of the time, not even mounting any partition is required: reading the partition table is enough to "reset" the card. I have one card here that always seems to fail this way whenever I do "apt-get clean; apt-get update; shutdown -h now".

My guess is that the bootloader does not like some commands' results or timing, if the wear-level logic is in an "unclean" state or something like that. But since I found no other card-reader that rejects the card in that state, I'd guess whatever the card does differently after some wear is according to specs.

ghollingworth commented 7 years ago

Hi,

Interesting problem, we have seen similar instances of this type of problem in the past but never been able to reproduce them (then of course the first thing people try is plug it into a PC and problem goes away!)

What would be useful is for you to reproduce the problem on a card, confirm that it happens on any Pi you plug it into and if it is send it to me, email me gordon@raspberrypi.org and I'll send you contact details

Since the problem seems to be in bootcode.bin (the bootrom has to have read bootcode.bin from the SDCard for it to flash the LED) I should be able to debug the bootcode.bin execution to identify the problem...

My guess would be something like mismatching FAT copies or something similar, i.e. something that Window's automatically fixes when you plug it in...

Another useful and interesting thing would be to reproduce the problem and then plug it into an x86 linux machine then remove and plug into the Pi again... Does Linux fix the problem as well?

AndreasTBT commented 7 years ago

@ghollingworth hi, Speaking of the devil... I tried if the problem goes away with the latest bootcode.bin and now I can't reproduce it anymore with any bootcode.bin (for now, still trying). Interestingly, last time I changed the bootcode.bin, I couldn't reproduce it either for a longer period of time (I hoped it was fixed back then).

The "mount-fix" also worked on linux without mount. Nothing special in dmesg (see forums.) Since FAT was also my first idea, 2 years ago, we started to mount /boot read-only.

"Since the problem seems to be in bootcode.bin ..." I had no flashes with the last bootcode.bin I tried (commit f37d96e4ac87143e5d6355fad23636153b8cb2e7) but that may be because that bootcode.bin differs from the one we used earlier... 30sec sounds like a timeout that may not be present in the old bootcode.bin. Is there any other way to tell if bootcode.bin is loaded?

I should get another "broken" SD card in a couple of days, where rewriting/replacing of files in /boot only holds up for one boot... but what should we do NOW? Do you know any industrial (fail-proof) µSD card that has definitely no such problems that we can try? Can you think of any way to fix it without removing the SD card? There is no way to usb-repair-boot on the "new" Pi2, is there?

Does the Pi use "common" code (like FatFS) for SD/Fat so we can rule-out bugs there? The code wouldn't be open so one could review it?

Thanks, Andreas

pelwell commented 7 years ago

How are you rebooting your Pis? I've only ever seen corruption:

  1. When removing the power supply or hard reseting without going through the proper shutdown procedure.
  2. Using ext4 on a card that doesn't support block erase properly.
  3. Using cards that aren't the advertised capacity (i.e. broken, fake cards).
  4. When I've been writing or hacking around with the SD card driver...
AndreasTBT commented 7 years ago

@pelwell it does not matter how. The SD card's data and filesystem is not corrupted. (Normally the PIs are shutdown using a Power GPIO-key -> shutdown, but the swissbit cards handle power loss very well: we never had a single corruption with journaling file systems, no matter what.)

pelwell commented 7 years ago

I was referring to this statement:

We are using industrial swissbit S-45µ EXT 4GB cards (because consumer grade cards will get corrupt on power loss eventually).

AndreasTBT commented 7 years ago

Ah, sorry. Having the SD card corrupt on power outages or kernel panics, which do both happen from time to time, would be even worse than the current situation.

ghollingworth commented 7 years ago

One thing I've seen is a change in behaviour after rebooting when testing USB mass storage boot. Basically the card gets into a state where it will not respond to any commands from the host for quite a long time (seconds). After this everything is fine again...

I believe with these high reliability cards they have a mechanism where they write updates to a special temporary area which has a higher reliability (something like SLC NAND) and high speed. It will then post these changes to the lower speed MLC NAND later. But it will only do the background work while it knows the host is busy waiting for the card to do something (like read a sector).

When you plug it into the Pi if it is in this state then I can well believe the Pi will just fail the sector read (it times out after around two seconds)

Basically there is nothing I can do until someone gives me an SD card exhibiting the problem...

In terms of increasing the reliability of running from SD card I would suggest moving temporary storage to a ram filesystem and making sure noatime is specified for all ext4 partitions.

AndreasTBT commented 7 years ago

Hi,

So that's a "no" on code available for review? And a "no" on high-reliability/industrial cards that are known to work?

Should the "chained usb boot" work in our case with latest bootcode.bin? (To make a pen drive labelled "repair sd-card" that just sits there).

As for the "secure" writing mechanism, I believe it writes the new block to an empty one and upon completion updates the internal mapping (wear leveling table) which is somehow secured (or journaled).

Anyway, given your theory, how is it that bootrom, which does a sector read, should work, but not bootcode after that? I'll report back when I get the other card, but I'd love to get an answer for the 3 questions on top of this post.

Regarding system tweaks: I believe /var/log is tmpfs, /tmp is tmpfs, swap is disabled, / is noatime (or relatime), /boot is read-only, /home = userdata+logs is on a separate partition, also noatime (or relatime). As another guy/issue also had SD issues with a one-week-reproducer using sqlite, I think in our typical use-case, the culprit may be sqlite, which does a lot of small writes (because of transaction-safety (which we require)).

ghollingworth commented 7 years ago

Correct, code is not open source We don't have any experience with high-reliability cards to help out.

If you have the USB host boot enabled it will fall back to booting from the USB mass storage if SD card fails...

I am only making a hypothesis for the problem not a definitive answer, I can't actually answer that until we find it is actually a problem...

Interesting that you're using SQLite, one of the problems we've had in the past is where SQLite always corrupts the ext4 partition (even if you shut down normally), it's something to do with the locking that it holds on the filesystem. Not sure if that's still a thing (it was about three years ago).

Gordon

AndreasTBT commented 7 years ago

@ghollingworth, havn't seen a corrupt ext4 since CMD+DMA issues were resolved. Besides power outage with consumer grade cards, that is. We try not to write to the rootfs to make sure we don't "brick" the system. User-data (=sqlite) is on another partition for that reason.

I also sent you an email.

AndreasTBT commented 7 years ago

@ghollingworth any updates?

AndreasTBT commented 7 years ago

A short update, if anyone else is following this: Booting a raspberry pi (2b v1.2) with root=USB allows swapping of the SD card with Linux running. Linux has no problems to access a non-booting card on the raspberry pi, so the problem got to be somewhere in the bootloader (or power-up?)...

Also, FatFS on LPC1778 as well as multiple android phones and card readers with Linux and Windows were tested. There is absolutely no dmesg output about any errors. It just works.

ghollingworth commented 7 years ago

Final update:

It seems that the card has got into a state where it requires some special action to hard reset the SD card whilst booting. Since the bootcode doesn't tickle the card in just the right way then this will never boot.

We're relatively happy that one of the old bugs in the old SD mmc driver is the cause of this problem and that it is not possible to reproduce it with the bcm2835-sdhost driver which is now the default on the Raspberry Pi.

AndreasTBT commented 7 years ago

it is not possible to reproduce it with the bcm2835-sdhost driver

You draw that conclusion based on what, exactly?

2017-07-14:

The firmware we (normally) ship since around 1 year ago is from June 2016, but the faulty state also happened with the latest raspbian on that card.

2017-07-19:

A clean dmesg output with root=/dev/sda2 (USB drive) and raspbian lite 2017-04-10. [...] The SD-card was either "redamaged" or not "fixed" (it did not boot after shutdown+power cycle).

bazzaxiv commented 6 years ago

I too am having this problem but in my case the fault is hard and only occurs in PI 2 and PI 3 devices. The card boots happily in Pi B and Zero Pi devices, see [https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=194848&p=1219773#p1219773]. Would @ghollingworth like to see it.

bazzaxiv commented 6 years ago

@AndreasTBT as well as the SwissBit device I also tried one from this range which does boot but I found the R/W speeds for small packets to be very slow http://www.smartm.com/salesLiterature/removable/microSD_pSLC_MLC_overview.pdf so did not do any long term testing.

AndreasTBT commented 6 years ago

@bazzaxiv what we tried so far is:

Our requirements for sd cards are, in that order:

  1. fail proof firmware (most are not!), 2. high life span, 3. fast on small writes (sqlite inserts)

ad. 1: "industrial", "reliable", "endurance", etc. do NOT reflect what happens during power-failure.

ArroyaveInformatik commented 5 years ago

We have the same problem every day with our thin clients. From 1 up to 3 clients cant boot in the morning, but after insert the sd card in a Windows or Linux PC, the SD card work and boot again..... We observe this issue with different SD card manufacturer, but not one time with Samsung EVO.

ghollingworth commented 5 years ago

Have you updated your bootcode.bin file to the latest?

We implemented a 'fix' / 'workaround' for this issue a few months ago. The problem was that between reading the bootcode.bin from the SDcard and then running that code and looking for start.elf the SDCard GPIOs were disabled with no pulls. This floating of the CLK and CMD lines caused the SD card to get into a state which was not possible to get out of without powering off the SD card.

The 'fix' was to re-enable the pull ups as the first thing bootcode.bin does when it is started. Obviously this isn't really a fix, because the floating state will still last for some time, but it did seem to fix the problem for us and for SwissBit.

ArroyaveInformatik commented 5 years ago

Yes, we use the actually bootcode.bin Today in the morning we had boot problems with 3 thin clients (all clients with ADATA UHS-I class 10 / 64GB SD-Cards)

All other clients with Samsung EVO function without issues

ghollingworth commented 5 years ago

So is the result the same with the ADATA cards, plugging into a PC will make them come alive again?

Gordon

ArroyaveInformatik commented 5 years ago

Exactly, after insert the SD Card into a PC runing Windows or Linux function again

ghollingworth commented 5 years ago

Sounds like the same or a similar problem, they probably both share the same FTL silicon which have the same effect.

Can you do the following on the device:

md5sum /boot/bootcode.bin

Current latest is: 8caa083dfa5897d32eb9b8b06dc127a9

Gordon

JamesH65 commented 5 years ago

Anyone have anything further to add to this issue?

This issue will be closed within 30 days unless further interactions are posted. If you wish this issue to remain open, please add a comment. A closed issue may be reopened if requested.

ArroyaveInformatik commented 5 years ago

The solution: With Samsung EVO Plus UHS-I Cards function perfectly without issues

crocket commented 5 years ago

Does Raspberry Pi 3 B+ boot from swissbit s-45u?

ghollingworth commented 5 years ago

Both should work fine with the latest bootcode.bin

crocket commented 5 years ago

should worries me. Did you test it? Or, does the latest bootcode.bin explicitly advertize support for swissbit s-45u?

AndreasTBT commented 5 years ago

@crocket I did not test the swissbit any further, since we did not get any notification on any progress about a fix being developed or released here or per mail.

Not sure if pi2 and pi3 works the same for boot stages, but on pi2

  1. The SDIO signal is still not 100% according to https://github.com/raspberrypi/firmware/issues/838#issuecomment-432618701
  2. ArroyaveInformatik's post https://github.com/raspberrypi/firmware/issues/838#issuecomment-432625240 is the latest real-world status and looks like it is not fixed (for some cards at least).
  3. I have no idea how or if the fix was tested.
  4. Reproduction is very hard (i.e. random) once you rewrite /boot or take a fresh card.
crocket commented 5 years ago

I think that you are better off with a decent externally powered USB to SATA converter and a 2.5 inch 3D TLC SSD than with swissbit s-45u because swissbit s-45u 32GB is significantly more expensive and lasts for a shorter amount of time than a 128GB 3D TLC SSD with a USB to SATA converter.

Many USB to SATA converters don't support TRIM through UASP, but TRIM doesn't make a significant difference in SSD lifespan unless SSD is nearly full.

A USB to SATA converter must be externally powered. 0.5W from Pi's USB 2.0 port cannot adequately power SSD's power consumption. If SSD was not supplied with enough power, it could die prematurely.

A microSD card should not be relied on for longevity.

ineeve commented 3 years ago

Yes, we use the actually bootcode.bin Today in the morning we had boot problems with 3 thin clients (all clients with ADATA UHS-I class 10 / 64GB SD-Cards)

All other clients with Samsung EVO function without issues

We have experienced boot problems with both Samsung EVOs and Sandisk microSD cards running on raspberry 3B rev1.2, where inserting the sd card into a Linux or Windows machine fixes the card and makes it bootable again.