raphael / linux-samus

Linux 4.16 on Chromebook Pixel 2015
GNU General Public License v2.0
181 stars 36 forks source link

SSD Firmware #124

Closed ehegnes closed 8 years ago

ehegnes commented 8 years ago

Prompted by @sprc's suggestion in issue #122, discussion of the SSD firmware shall be confined to this thread.

The following update was suggested by me:

@remexre, you are brave. There be dragons ahead/use at your own risk, and all that. If I were you (when I'm you in a few days), I'd dd my drive as a backup and run this command from a LiveCD with the device unmounted.

Firmware Blob: firmware-ssd-02.3.bin MD5SUM: 5cbc5c2da4bed35ea499d18dbde6d295 Command:

sudo hdparm --fwdownload-mode7 "/path/to/firmware.bin" --yes-i-know-what-i-am-doing --please-destroy-my-drive "/dev/sda"

This should all, presumably, work. I'm really not sure. :)

This didn't work, resulting in:

/dev/sda: fwdownload: xfer_mode=7 min=1 max=65535 size=461312 FAILED: Input/output error

But this also did not harm the drive.

Mentions: @remexre, @recri, @aeroevan

stefanwiegmann commented 8 years ago

Does anybody know, which versions of the firmware have this problem? I am on 1.8 and never had any issue running arch on it (including luks-encryption, no swap).

recri commented 8 years ago

The point I was trying to make is that it's not normal usage to rewrite the firmware on your disk drive. So if you happen to down load the new firmware file on to your disk drive, just like you have always done before, and then run the software to update the firmware, it's entirely possible that the firmware rewrite will disable your disk drive before you read the firmware off of it. Programs are not usually written with the expectation that the file system will disappear while the program is running. The disappearance of the file system will probably yield an Input/output error, the program will crash, no firmware will be rewritten, the system will crash, and all will be well after a cold start, modulo a few dangling files. And, no, I haven't done that exactly, but, yes, I've done several very similar things.

-- rec --

On Wed, Mar 16, 2016 at 10:39 AM, Eric Hegnes notifications@github.com wrote:

Prompted by @sprc https://github.com/sprc's suggestion in issue #122 https://github.com/raphael/linux-samus/issues/122, discussion of the SSD firmware shall be confined to this thread.

The following update was suggested by me:

@remexre https://github.com/remexre, you are brave. There be dragons ahead/use at your own risk, and all that. If I were you (when I'm you in a few days), I'd dd my drive as a backup and run this command from a LiveCD with the device unmounted.

Firmware Blob: firmware-ssd-02.3.bin https://gist.github.com/ehegnes/92ed8fe0078294b71ec6/raw/f565b2ad326747665b92ba1325b558de75399735/firmware-ssd-02.3.bin MD5SUM: 5cbc5c2da4bed35ea499d18dbde6d295 Command:

sudo hdparm --fwdownload-mode7 "/path/to/firmware.bin" --yes-i-know-what-i-am-doing --please-destroy-my-drive "/dev/sda"

This should all, presumably, work. I'm really not sure. :)

This didn't work, resulting in:

/dev/sda: fwdownload: xfer_mode=7 min=1 max=65535 size=461312 FAILED: Input/output error

But this also did not harm the drive.

Mentions: @remexre https://github.com/remexre, @recri https://github.com/recri, @aeroevan https://github.com/aeroevan

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raphael/linux-samus/issues/124

remexre commented 8 years ago

@recri I got a new flash drive, put arch linux 2016.03.01 on it, booted it, then rsync'd the firmware over from a separate machine, then ran the command. /dev/sda was not mounted until after I got the error, and there was no crash, the machine didn't shut down until I ran halt -p.

ehegnes commented 8 years ago

@recri, I did mention that it should probably be done from a LiveCD. Google solves this issue by providing the firmware in a depthcharge payload (I think).

raphael commented 8 years ago

@remexre just curious: is that an actual SSD or a USB flash drive?

remexre commented 8 years ago

The live disk was from a flash drive from some conference, /dev/sda was the SSD.

ghost commented 8 years ago

@stefanwiegmann i can confirm that 1.8 is affected, my ssd died about 5 minutes ago and i was running firmware version 1.8. (Ubuntu not Arch but it probably doesnt matter) Luckily i dd'd my drive on Wednesday so if i can get a replacement ill be able to re-image it

@raphael did google give you a hard time about being in developer mode?

stefanwiegmann commented 8 years ago

thanks for the info, pruddiman. guess I'll dd and sit down on the weekend to do whatever I have to do.

ehegnes commented 8 years ago

Uh oh. Looks like this is a bigger issue than I originally thought. I suppose I too will be dding and spending the rest of the night trying to flash the new firmware.

In the worst case (most kludgy) scenario, would it be possible to restore ChromeOS with the official recovery images, boot with their depthcharge payload so the update installs, and reinstall our *nix distros?

EDIT: I'll be on IRC if anybody has some more insights. I'm not exactly an expert in the world of firmware.

raphael commented 8 years ago

@pruddiman it took some convincing, had to send videos. If you have fsck logs these help.

@ehegnes I keep 5gb to dual boot chromeos for updates.

ehegnes commented 8 years ago

@raphael, that sounds like a wise idea for whenever I reinstall next.

On Thu, Mar 17, 2016, 10:10 PM Raphaël Simon notifications@github.com wrote:

@pruddiman https://github.com/pruddiman it took some convincing, had to send videos. If you have fsck logs these help.

@ehegnes https://github.com/ehegnes I keep 5gb to dual boot chromeos for updates.

— You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub https://github.com/raphael/linux-samus/issues/124#issuecomment-198164605

stefanwiegmann commented 8 years ago

as I said, I'll dd and flash the firmware, no problem. But I am still curious: is the firmware really the reason for the ssd failures? Or do we have to be careful with something else as well.

I remember the arch thread https://bbs.archlinux.org/viewtopic.php?pid=1587933#p1587933 had comments about degrading performance for some. I never had these issues. Could it be something like passing all the discards through fstab, lvm and luks or swap space/swappiness?

pierater commented 8 years ago

Hey I have been following this repo for a little while and going through the threads. I just got my pixel, but I don't know how to check my SSD firmware. I am running Arch with kernel version 4.4.2-6. Also, @ehegnes what channel are you guys on?

ghost commented 8 years ago

@pierater you can check your firmware version with hdparm -i /dev/sda (or whatever device your ssd is) from terminal you'll see FwRev=**\ on the first line of the output

raphael commented 8 years ago

@stefanwiegmann good question. I had my swap mounted on a usb drive so we know that's not the problem. I wonder if my 10 Linux kernel compilation a day usage pattern contributed to the problem although I'm not sure it's the right order of magnitude (in terms of how many writes one can make before the SSD starts failing). It did degrade progressively as I had to run fsck a few times before the final failure. I wish I knew the real root cause.

stefanwiegmann commented 8 years ago

@raphael so you had constantly errors piling up in fsck's? I don't think your compiling is enough to have these issues already, if it should last years under "normal" usage, with "normal" covering average users writing, reading, caching movies and music all the time

raphael commented 8 years ago

Agreed. I had to run fsck a few times over the course of a few weeks before it wouldn't be able to fix all the problems.

nelsonni commented 8 years ago

I've got Arch running on firmware revision S9FM01.8 for SSD without any performance issues. I've got it formatted with BTRFS though, so read/writes patterns are probably going to be slightly different.

pierater commented 8 years ago

I'm running arch with 1.8, although it's only been just over a week and I haven't noticed anything. How long is it taking for the SSDs to die?

ethanmad commented 8 years ago

Same situation for me as @nelsonni: Arch, btrfs, firmware 1.8. Good partition scheme with nothing fancy on top. I've had the Pixel set up this way since August.

But I'm scared. I can't have my computer stop working while I'm at school.

stefanwiegmann commented 8 years ago

it's good to read, that some report no issues on 1.8. But having updated firmware wouldn't be bad either. Did anybody flash the firmware outside of chromeOS (from other liveCD?) via hdparm......?

ehegnes commented 8 years ago

I'm using F2FS on 1.8, and fsck never fixes the errors on boot. But at least the errors aren't increasing in number.

@stefanwiegmann I just tried hdparm via the Gentoo LiveCD, and I had a mite more luck with --fwdownload-mode3, but I still get the inevitable FAILURE: Input/output error. Apparently, some drives just aren't compatible with hdparm due to the way they structure their firmware files, so maybe that's the case for us.

If it's any consolation, it seems that 1.7 is the earliest fw, not 1.8.

I'm going to ditch this custom update method for now and try to dual boot as per @raphael's suggestion, updating via the official payload.

ehegnes commented 8 years ago

Although it will be a longer solution, I can do a write-up if it works.

stefanwiegmann commented 8 years ago

@ehegnes did you try this: https://wiki.archlinux.org/index.php/SSD_memory_cell_clearing. I wonder if that would fix it. I did that before I did my install, just as a pre-caution. It will wipe everything, so you either dd or take your time to install everything all over again.

ehegnes commented 8 years ago

@stefanwiegmann neat resource! I'll try it before I recover ChromeOS.

ehegnes commented 8 years ago

@stefanwiegmann, full disk erasure did not solve the hdparm issues.

I'm currently in the process of recovering and setting up dual boot.

stefanwiegmann commented 8 years ago

@ehegnes "hdparm issues" meaning being able to flash the firmware?

I was more hoping/expecting the fsck errors would go away :-) Good luck!

ehegnes commented 8 years ago

@stefanwiegmann oops! I meant being able to flash the firmware. I suppose I misread your suggestion and just missed my chance to check the errors.

I am, however, on 2.3 now. I used this script, provided by Google, to create a recovery drive. After recovery (hold ESC + F3 and tap Power), on its first boot it warns of a "critical update" and reboots twice, presumably installing firmware update(s), before launching into ChomeOS.

At this point, I could either dd my backed up Gentoo image to the drive, or try to dual boot.

stefanwiegmann commented 8 years ago

@ehegnes :-) at least you have newest firmware now.

So, do we know now, if you don't get any errors anymore, what it was? firmware or ssd-reset? Guess it wouldn't matter much to you ;-)

ehegnes commented 8 years ago

I actually don't know if I'm error-free. I don't have a LiveCD with recent enough fsck tools to be able to check. I'll check and report as soon as I restore.

stefanwiegmann commented 8 years ago

Okay, I am curious. Thanks for updating!

ehegnes commented 8 years ago

Finally got a dual boot working. Ran a full check on all partitions and everything seems fine with the disk. I suppose we never found a proper solution to updating the firmware, but I'd be glad to do a write-up if there is interest? Some of the intricacies of dual booting with ChromeOS are not trivial.

colemickens commented 8 years ago

How did you do it? The script in the Arch wiki was painless and nearly fully automated if I recall correctly (it was many months ago at this point. I had decided to dual boot for this exact reason - firmware updates - though I was more concerned about the typec->dp adapter).

ehegnes commented 8 years ago

@colemickens, after restoring ChromeOS, I used Google's GPT partitioning tool cgpt to resize the stateful partition that ChromeOS uses and the partitions /dev/sda6 (labeled KERN-C) and /dev/sda7 (labeled ROOT-C) to fill the remaining space — I recommend any scripts like the one ChrUbuntu uses to do this, as it is much easier than learning cgpt. After a reboot to fix the stateful partition, you boot into a LiveCD and restore your backed-up root partition to /dev/sda7, your boot partition to /dev/sda6, and proceed with the usual steps (grub2-mkconfig -o /boot/grub/grub.cfg and grub2-install /dev/sda --force) to setup the bootloader. Then you press Ctrl+L as usual at the boot screen and wait for it to timeout to your bootloader. Make sure to change /etc/fstab to appropriately reflect the new partition scheme.

That was a mite rushed. I can include resource links and specific commands if it would help.

ehegnes commented 8 years ago

The parts that weren't trivial for me were recognizing that you can only resize partitions, not delete or create them (otherwise, ChormeOS complains that you need to restore the OS again) and recognizing that you need to use /dev/sda6 (or KERN-C) to house your kernels (that last part may not be true, but it's the only way I could get it to boot).

colemickens commented 8 years ago

I just used this: https://wiki.archlinux.org/index.php/Chrome_OS_devices#Alternative_installation.2C_Install_Arch_Linux_in_addition_to_Chrome_OS

Worked out of the box. Didn't have to do anything special with kernel placement or anything. A completely normal install of Arch worked and I could dual boot afterward.

ehegnes commented 8 years ago

@colemickens Right, that's the kind of script that I would recommend for partitioning. Is your boot partition separate from your root partition, or is everything on one partition?

colemickens commented 8 years ago

I was lazy, it's just one partition.

ehegnes commented 8 years ago

Then I stand corrected and suppose you don't need to use KERN-C. :) Thanks for the added info.

stefanwiegmann commented 8 years ago

glad you are back in business! Guess for now, I am to lazy to change anything. Sounds like everybody had visible errors leading up to this and you two just proofed it can be fixed at that point by yourself. As long as I don't get fsck issues, I keep what I have. Thanks!

iain-logan commented 8 years ago

So what's the state of play now? I've been running Slackware current since a little after the launch of the pixel, and as a result haven't received any of Google's firmware updates. Is it critically recommended to get these updates? Seeing this has kinda scared me, I can't have my SSD die during uni.

ehegnes commented 8 years ago

@iain-logan, I would dd the entire disk (/dev/sda, so it includes your partition table and all) to some external storage, restore ChromeOS for the firmware update(s), and flash your backup to the disk, again with dd.

I opted to dual boot, if only because it makes playing DRM content much less painful.

Updating the firmware is probably a good idea, and if you are getting fsck errors that can't be resolved, a full disk erasure before restoring the backup, as @stefanwiegmann suggested, might help.

Would you like a detailed guide? :)

EDIT: Also, Slackware is awesome.

iain-logan commented 8 years ago

Thanks for the prompt response!

Cool, I've got dd running making a backup of the disk currently.

Dual boot does sound like an attractive option for getting future updates, but I think I'll need to leave that for when I have more time.

In regards to fsck errors, I don't have a live image to hand, so I can't unmount my main partition to run fsck on it. Again, I think I'll leave that until I have a little more time.

I'm ashamed to admit that I'm no expert when it comes to this kind of thing, so a bit of a guide would be really appreciated. Perhaps this kind of information would be worth adding to the README here?

stefanwiegmann commented 8 years ago

if I remember correctly, the situation is this; @ehegnes, @raphael, please correct me, if I'm wrong:

There are pixels with ssd firmware 1.8 or lower, which suffer from degrading performance and visible fsck errors. There are many pixels on 1.8 which don't have issues. fsck on boot should be on by default. You would know, if you turned it off. If you have errors, you should see them during boot. fsck on ArchWiki

@raphael didn't do anything about it and it died. He was able to get it replaced, but it didn't sound like the standard-no-questions-asked procedure. @ehegnes had the errors and did two things: SSD memory cell clearing and then installed ChromeOS again, which took care of updating the firmware. At this point we know this will solve the issues, but we don't know if only one of them would have been sufficient. My idea earlier was to backup with dd, only reset the ssd and not update the firmware and then restore via dd. But, hey, there is a turkish proverb: The shortest way is the way you know.

Once you get fsck errors or have degrading performance, you still have time to do what @ehegnes did. I don't have problems (I am on 1.8) and will wait until I get them or until I want to redo everything anyway.

SimionKreimer commented 8 years ago

Dual boot seems like a good way to go for future updates. It would be really nice to have some step by step directions on how to do all of that.

On Sun, Apr 3, 2016 at 4:34 PM, Stefan Wiegmann notifications@github.com wrote:

if I remember correctly, the situation is this; @ehegnes https://github.com/ehegnes, @raphael https://github.com/raphael, please correct me, if I'm wrong:

There are pixels with ssd firmware 1.8 or lower, which suffer from degrading performance and visible fsck errors. There are many pixels on 1.8 which don't have issues. fsck on boot should be on by default. You would know, if you turned it off. If you have errors, you should see them during boot. fsck on ArchWiki https://wiki.archlinux.org/index.php/Fsck

@raphael https://github.com/raphael didn't do anything about it and it died. He was able to get it replaced, but it didn't sound like the standard-no-questions-asked procedure. @ehegnes https://github.com/ehegnes had the errors and did two things: SSD memory cell clearing https://wiki.archlinux.org/index.php/SSD_memory_cell_clearing and then installed ChromeOS again, which took care of updating the firmware. At this point we know this will solve the issues, but we don't know if only one of them would have been sufficient. My idea earlier was to backup with dd, only reset the ssd and not update the firmware and then restore via dd. But, hey, there is a turkish proverb: The shortest way is the way you know.

Once you get fsck errors or have degrading performance, you still have time to do what @ehegnes https://github.com/ehegnes did. I don't have problems (I am on 1.8) and will wait until I get them or until I want to redo everything anyway.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/raphael/linux-samus/issues/124#issuecomment-205049757

ehegnes commented 8 years ago

@stefanwiegmann, that all seems correct, except I didn't do the disk erasure in the way that the Arch wiki describes. I just restored ChomeOS, which is (probably?) effectively the same.

@SimionKreimer, I'll do a full write-up tomorrow morning with directions for dual-booting Arch, Ubuntu, and other distros, along with directions for backing up, installing the updates through ChromeOS, and restoring without dual-booting.

@raphael, the README is getting kinda long. Might it make sense to add my write-up to a wiki page instead and add a link to it on the README?

recri commented 8 years ago

Well, I have a dirt simple dual boot which may not be for everyone.

I simply told Ubuntu 15.10 to install on /dev/sda1 as the root and only partition, without reformatting and without removing anything, so the entire Ubuntu installation is on a drive which ChromeOS uses in mysterious ways but has not had a conflict that I have noticed. I also told the Ubuntu 15.10 installer to make a bootstrap for the partition, which uses a deprecated block list bootstrap, but also works as far as I've noticed. Specifying this boot method the first time, manually, was a pain, but the installer did it quite simply. This setup, in one of two versions, has been running for months. I only reinstalled because I had a replacement Pixel.

It's extremely unhygienic, but there it is.

-- rec --

On Sun, Apr 3, 2016 at 7:55 PM, SimionKreimer notifications@github.com wrote:

Dual boot seems like a good way to go for future updates. It would be really nice to have some step by step directions on how to do all of that.

On Sun, Apr 3, 2016 at 4:34 PM, Stefan Wiegmann notifications@github.com wrote:

if I remember correctly, the situation is this; @ehegnes https://github.com/ehegnes, @raphael https://github.com/raphael, please correct me, if I'm wrong:

There are pixels with ssd firmware 1.8 or lower, which suffer from degrading performance and visible fsck errors. There are many pixels on 1.8 which don't have issues. fsck on boot should be on by default. You would know, if you turned it off. If you have errors, you should see them during boot. fsck on ArchWiki <https://wiki.archlinux.org/index.php/Fsck

@raphael https://github.com/raphael didn't do anything about it and it died. He was able to get it replaced, but it didn't sound like the standard-no-questions-asked procedure. @ehegnes https://github.com/ehegnes had the errors and did two things: SSD memory cell clearing https://wiki.archlinux.org/index.php/SSD_memory_cell_clearing and then installed ChromeOS again, which took care of updating the firmware. At this point we know this will solve the issues, but we don't know if only one of them would have been sufficient. My idea earlier was to backup with dd, only reset the ssd and not update the firmware and then restore via dd. But, hey, there is a turkish proverb: The shortest way is the way you know.

Once you get fsck errors or have degrading performance, you still have time to do what @ehegnes https://github.com/ehegnes did. I don't have problems (I am on 1.8) and will wait until I get them or until I want to redo everything anyway.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub < https://github.com/raphael/linux-samus/issues/124#issuecomment-205049757>

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/raphael/linux-samus/issues/124#issuecomment-205079416

ehegnes commented 8 years ago

Actually, I didn't realize that there wasn't an install script for Arch, so that won't be in the write-up.

@recri that may work, although I wouldn't know how. I was thinking of summarizing the instructions here and providing a script so that people can understand how their system is being partitioned while also easily making room for another distribution.

Then they can simply use an official installer or install normally, taking care to select a certain partition as their root partition.

vadixidav commented 8 years ago

Is there any way to update the firmware after the SSD stops working? Right now using the recovery media doesn't help either and responds with "and unexpected error has occurred." Seabios stopped working a while ago as well, which meant I couldn't boot to external media either. Now I don't see how it could possibly boot. I might try to see if they will replace my chromebook, but I imagine they wont.

Edit: Also looks like I've gone over the 1 year warranty as well.

cowlicks commented 8 years ago

Hey y'all, just to confirm. The SSD issues were definitely due to firmware?

I was having SSD problems and just assumed it was a hardware failure, so I reinstalled chromeos so that I could get this thing warrantied. But upon installing chromos, the failures stopped and I'm not sure how to detect the errors with chomeos. The "crosh" storage_test_1/2 tests don't find anything. Also I'm not even sure how to check my SSD firmware version from chromeos (I guess no one has written a browser extension for that hehe).