Closed pvanhauw closed 4 years ago
Hi,
some devices don't work reliably with ALPM. Try
SATA_LINKPWR_ON_BAT=max_performance
or
SATA_LINKPWR_ON_BAT=medium_power
Can confirm this issue on an Acer Aspire V5-573G. Linux Mint 17 Qiana 64 with cinnamon desktop and Crucial MX100 512GB. On battery i get the I/O error.
Changing "SATA_LINKPWR_ON_BAT" solved this problem for me. "medium_power" was good enough.
Thanks
@pvanhauw: did my suggested workaround help?
Had the same issue with ThinkPad R400 and MX100 512GB, SATA_LINKPWR_ON_BAT=max_performance solved the problem
@linrunner I can tentatively confirm that your workaround helps. I had the same issues (Asus UX32LN + MX100 512GB + Linux Mint 17) and have been trying to reproduce the error for the past 3 days. So far no crashes. For me the crashes were seemingly random and somewhat far apart, I will report back in another couple of days.
@sondree: which of the two suggested values?
@linrunner SATA_LINKPWR_ON_BAT=medium_power
FYI for others running across this issue:
An upstream report can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=72191
Please note Comment #23; specifically, the "medium_power" workaround for laptop models such as the Lenovo T440S appears to be SSD model sensitive/specific. So be sure to carefully track your journal/logs if using an ALPM setting besides "max-performance".
Hi guys,
could you post some more dmesg snippets so i can design a regexp for this?
I'm considering to add a warning to tlp-stat output.
Just searched my journal and no longer have any of the past errors available for posting (and rather not force the issue by inducing them). However, they were similar/identical to those under the initial post here:
I have implemented a check for the above errors in tlp-stat – sample output:
+++ Warnings
* Kernel log shows ata errors (2) possibly caused by the configuration: SATA_LINKPWR_ON_AC/BAT=min/medium_power
--> Consider using medium_power or max_performance instead!
--> Check yourself with:
dmesg | egrep -A 5 "ata[0-9]+: SError: { .*CommWake }"
Yep, I confirm
SATA_LINKPWR_ON_BAT=max_performance
worked. I have not done extensive testing with the medium option yet.
Pierre
2014-09-03 19:14 GMT+02:00 linrunner notifications@github.com:
@pvanhauw https://github.com/pvanhauw: did my suggested workaround help?
— Reply to this email directly or view it on GitHub https://github.com/linrunner/TLP/issues/84#issuecomment-54331575.
@linrunner @pvanhauw I have been using medium_power on my setup for the past 2 weeks. No errors related to this so far, seems all good :)
Check released with 0.6.
I leave this open. More reports are welcome.
Thinkpad L420, Fedora 22, same problem with a Crucial MX100, set in on max_power and it works.
Thinkpad Yoga, Fedora 22, same problem with a Crucial MX100, set in on max_power and it works.
I also note a number of comments on the Crucial support forums mentioning system stability or slowdowns under Windows until SATA link power management is disabled.
I plan on commenting on this kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=89261 and suggesting a blacklist entry the SATA link power management for the MX100.
On Thinkpad T450 with MX100 crucial 512 SSD, with SATA_LINKPWR_ON_BAT=min_power, if I put the laptop on sleep first before unplugging power source, it will work normally after that.
I'm running Debian Jessie on a Thinkpad T440s with an ordinary HDD (i.e. no SSD). When switching to SATA_LINKPWR_ON_BAT=medium_power
I get the following in dmes --human
:
[Nov 8 11:47] ata1.00: exception Emask 0x10 SAct 0x6000000 SErr 0x50000 action 0xe frozen
[ +0,000005] ata1.00: irq_stat 0x00400000, PHY RDY changed
[ +0,000002] ata1: SError: { PHYRdyChg CommWake }
[ +0,000003] ata1.00: failed command: WRITE FPDMA QUEUED
[ +0,000004] ata1.00: cmd 61/08:c8:38:09:44/00:00:15:00:00/40 tag 25 ncq 4096 out
res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[ +0,000002] ata1.00: status: { DRDY }
[ +0,000002] ata1.00: failed command: WRITE FPDMA QUEUED
[ +0,000004] ata1.00: cmd 61/08:d0:28:0b:91/00:00:15:00:00/40 tag 26 ncq 4096 out
res 50/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x10 (ATA bus error)
[ +0,000001] ata1.00: status: { DRDY }
When using SATA_LINKPWR_ON_BAT=max_performance
the error messages disappear. Currently tlp-stat
does not issue a warning (I'm using TLP from http://repo.linrunner.de/debian, I guess it is version 0.8). So I'm providing my dmes
output as requested by @linrunner :-).
I just found out you have to activate warnings (tlp-stat -w
) in order to see the message. So the check also works on my machine, please ignore my previous comment.
Hi Urs, nevertheless thanks for your report.
Since 0.8 [1] tlp-stat
without parameters and tlp-stat -d
should both produce the warnings section at the end of the output. I'd appreciate if you could retest this.
[1] https://github.com/linrunner/TLP/blob/master/tlp-stat.in#L1103
@linrunner I've just retested it: both tlp-stat
without parameters and tlp-stat -d
issue the warning as expected.
I could be that when I checked this morning I had already switched back to SATA_LINKPWR_ON_BAT=max_performance
.
@UrsMetz: thanks for your reassuring feedback.
Just got bit by this on vacation this past weekend, System76 Kudu Professional, with an aftermarket Crucial MX100 512gb installed by myself. Wish I found this before spending the past 6.5 hours doing fsck -vccfk /dev/sdb1
! Didn't even make the connection that I only got hit by this on battery... It's late my time but I'll try this out tomorrow, seems like a possible fix.
EDIT: Yup, the medium_performance
setting works for me! Thanks! :+1:
Hi All,
I've been working on a kernel patch adding a new SATA LPM policy: med_power_with_dipm, which matches the power-management defaults from the Intel RST Windows drivers. It would be interesting if people who where having issues with min_power, but are fine with medium_power could test this new policy, it saves almost as much power as min_power and hopefully, since it mimicks Windows, should not hit any SSD firmware bugs like min_power sometimes does.
For more info see: https://hansdegoede.livejournal.com/18412.html
Regards,
Hans
@jwrdegoede Is there any easy way to test your patch on Debian (in fact jessie as I haven't updated yet)? I just had a quick look at your blog post and you are only mentioning about Fedora.
Hi,
On 15-09-17 22:25, Urs Metz wrote:
@jwrdegoede https://github.com/jwrdegoede Is there any easy way to test your patch on Debian (in fact jessie as I haven't updated yet)? I just had a quick look at your blog post and you are only mentioning about Fedora.
Nope, sorry you will need to build a kernel with the patch yourself.
Perhaps someone with Debian experience is reading along and can do a pre-patched kernel pkg like I've done for Fedora ?
Regards,
Hans
I've uploaded patched Ubuntu kernel packages here.
They won't work with Debian however, because kernel infrastructure is a bit different there.
ps. and yes, you may specify
SATA_LINKPWR_ON_AC=med_power_with_dipm
SATA_LINKPWR_ON_BAT=med_power_with_dipm
with any version of TLP :-).
Hi,
On 18-10-17 19:42, linrunner wrote:
I've uploaded patched Ubuntu kernel packages here http://download.linrunner.de/packages/.
They won't work with Debian however, because kernel infrastructure is a bit different there.
Cool, thank you!
It would be great if someone with one of the affected SSDs could test this kernel with the new med_power_with_dipm setting. This saves almost as much power as min_power and if this turns out to be save for more disks / SSDs it might make a better default going forward.
Regards,
Hans
@jwrdegoede I just reread part of this issue and asking myself whether the patch is only relevant (and thus the power saving only happens) for SSD and not good ol' HDD? I'm still using a HDD. In case this is also relevant for HDD I might give your patch a try but I can't promise I'll find the time to create a patched kernel and test it.
The patch is relevant for HDD too.
Hi,
On 18-10-17 20:01, Urs Metz wrote:
@jwrdegoede https://github.com/jwrdegoede I just reread part of this issue and asking myself whether the patch is only relevant (and thus the power saving only happens) for SSD and not good ol' HDD? I'm still using a HDD. In case this is also relevant for HDD I might give your patch a try but I can't promise I'll find the time to create a patched kernel and test it.
The powersaving should be about the same on HDD and testing with HDDs also is good to have. Although the main thing I'm interested in from this specific github issue is testing with the Crucial SSDs which was triggering the issue as originally described.
Regards,
Hans
All,
Thought it might be helpful to comment on my this issue given my posts on it from Sept. 2014 above.
Currently have four Lenovo Thinkpads (models T440s (referenced in the Sept 2014 posts), X200s, X220 and X250) all running Arch Linux and TLP. All of them have Crucial MX300 SSDs of either 275GB or 525GB capacities on the most current firmware (M0CR060) and running SATA_LINKPWR_ON_BAT=min_power
without any issues whatsoever.
Given the different ages of the machines, BIOS vs. UEFI, etc but the fact that they are all utilizing the same make and model of SSD suggests that (at least) the Crucial MX300 is not plagued by this issue. Perhaps the SSD technology has evolved in the last 3+ years to a point where this setting will no longer be a problem for SSDs (or at least some number of them)?
Cheers, Halocaridina
According to a bug report which I just received, this still seems to be happening on Crucial MX100 SSDs with my new med_power_with_dipm policy. So I'm going to add a LPM blacklist entry for this SSD to the kernel.
I notice that all reporters in this and other bugs about the MX100 have the 512GB model, or are not specifying their SSD's size. If you've seen this problem with a Crucial MX100 which is not 512GB, please let me know ASAP, as for now I plan to limit the blacklist entry to the 512GB model.
I would also appreciate the output of: "dmesg | grep Crucial" from machines where people have seen this problem.
Hans, is the specific model CT500BX100SSD1?
Here's another bug report [1] about that particular model. I am also waiting for another user's feedback.
If you don't mind, can you let me send the patch, also help to review the patch? Thanks.
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1726930
Hi,
On 14-02-18 12:01, Kai-Heng Feng wrote:
Hans, is the specific model CT500BX100SSD1?
Here's another bug report [1] about that particular model. I am also waiting for another user's feedback.
If you don't mind, can you let me send the patch, also help to review the patch? Thanks.
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1726930
No it is a Crucial_CT512MX100, so a MX100 not a BX100, which AFAIK are quite different models. The patch I'm preparing for it looks like this:
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4530,6 +4530,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER }, { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER },
/* devices that don't properly handle queued TRIM commands */
{ "Micron_M500_*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
ATA_HORKAGE_ZERO_AFTER_TRIM, },
Note it would be good to get people to test with 4.14+ and med_power_with_dipm, at least for an MX300 I've reports of that fixing issues which are seen when using min_power.
Regards,
Hans
On 14 Feb 2018, at 7:24 PM, Hans de Goede notifications@github.com wrote:
Hi,
On 14-02-18 12:01, Kai-Heng Feng wrote:
Hans, is the specific model CT500BX100SSD1?
Here's another bug report [1] about that particular model. I am also waiting for another user's feedback.
If you don't mind, can you let me send the patch, also help to review the patch? Thanks.
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1726930
No it is a Crucial_CT512MX100, so a MX100 not a BX100, which AFAIK are quite different models. The patch I'm preparing for it looks like this:
Thanks. I’ll send a separate patch for the model in Launchpad bug.
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4530,6 +4530,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER }, { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER },
- / The 512GB version of the MX100 has both queued TRIM and LPM issues /
- { "Crucial_CT512MX100*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM |
- ATA_HORKAGE_NOLPM, },
/ devices that don't properly handle queued TRIM commands / { "MicronM500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, },
Note it would be good to get people to test with 4.14+ and med_power_with_dipm, at least for an MX300 I've reports of that fixing issues which are seen when using min_power.
Hmm, maybe let distro kernels use med_power_with_dipm as default through CONFIG_SATA_MOBILE_LPM_POLICY?
Regards,
Hans — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Hi,
On 14-02-18 20:32, Kai-Heng Feng wrote:
On 14 Feb 2018, at 7:24 PM, Hans de Goede notifications@github.com wrote:
Hi,
On 14-02-18 12:01, Kai-Heng Feng wrote:
Hans, is the specific model CT500BX100SSD1?
Here's another bug report [1] about that particular model. I am also waiting for another user's feedback.
If you don't mind, can you let me send the patch, also help to review the patch? Thanks.
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1726930
No it is a Crucial_CT512MX100, so a MX100 not a BX100, which AFAIK are quite different models. The patch I'm preparing for it looks like this:
Thanks. I’ll send a separate patch for the model in Launchpad bug.
Before you do so, please double check it is broken even with med_power_with_dipm…
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4530,6 +4530,11 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER }, { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER },
- / The 512GB version of the MX100 has both queued TRIM and LPM issues /
- { "Crucial_CT512MX100*", NULL, ATA_HORKAGE_NO_NCQ_TRIM |
- ATA_HORKAGE_ZERO_AFTER_TRIM |
- ATA_HORKAGE_NOLPM, },
/ devices that don't properly handle queued TRIM commands / { "MicronM500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM | ATA_HORKAGE_ZERO_AFTER_TRIM, },
Note it would be good to get people to test with 4.14+ and med_power_with_dipm, at least for an MX300 I've reports of that fixing issues which are seen when using min_power.
Erm, that 4.14+ above should be 4.15+, sorry.
Hmm, maybe let distro kernels use med_power_with_dipm as default through CONFIG_SATA_MOBILE_LPM_POLICY?
Yes that is the whole purpose of CONFIG_SATA_MOBILE_LPM_POLICY, note that I authored the patch adding that new Kconfig option :)
Regards,
Hans
On 15 Feb 2018, at 5:05 AM, Hans de Goede notifications@github.com wrote: Before you do so, please double check it is broken even with med_power_with_dipm…
Yes I can confirm it. I wrote a new quirk that will fallback to med_power_with_dipm when min_power gets selected.
The user confirmed it’s med_power_with_dipm [1] but the same issue happened.
Yes that is the whole purpose of CONFIG_SATA_MOBILE_LPM_POLICY, note that I authored the patch adding that new Kconfig option :)
That’s good to know. Are you going to use default 3 (med_power_with_dipm) on Fedora?
[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1726930/comments/30
Kai-Heng
Regards,
Hans — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Hi,
On 15-02-18 14:20, Kai-Heng Feng wrote:
Yes that is the whole purpose of CONFIG_SATA_MOBILE_LPM_POLICY, note that I authored the patch adding that new Kconfig option :)
That’s good to know. Are you going to use default 3 (med_power_with_dipm) on Fedora?
Yes that is the plan.
Regards,
Hans
All kernels should be patched by now.
I use linux mint 17 qiana 64bits with cinnamon. I installed a new ssd: the MX100 crucial 512 on a Samsung Ativ book 8 (Np870)
After sometime, but this also ALWAYS happens if the laptop stays idle and the screen is disactivated, the file system is remounted in read only because of an error.
You can find all the information here: http://forums.linuxmint.com/viewtopic.php?f=49&t=174315
The most important part is the dmesg:
1982.874590] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x40000 action 0x6 frozen [ 1982.874595] ata5: SError: { CommWake } [ 1982.874598] ata5.00: failed command: FLUSH CACHE EXT [ 1982.874602] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [ 1982.874602] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1982.874604] ata5.00: status: { DRDY } [ 1982.874607] ata5: hard resetting link [ 1988.238907] ata5: link is slow to respond, please be patient (ready=0) [ 1992.890664] ata5: COMRESET failed (errno=-16) [ 1992.890670] ata5: hard resetting link [ 1998.254987] ata5: link is slow to respond, please be patient (ready=0) [ 2002.906743] ata5: COMRESET failed (errno=-16) [ 2002.906750] ata5: hard resetting link [ 2008.271052] ata5: link is slow to respond, please be patient (ready=0) [ 2037.975036] ata5: COMRESET failed (errno=-16) [ 2037.975042] ata5: limiting SATA link speed to 3.0 Gbps [ 2037.975044] ata5: hard resetting link [ 2043.003094] ata5: COMRESET failed (errno=-16) [ 2043.003101] ata5: reset failed, giving up [ 2043.003103] ata5.00: disabled [ 2043.003105] ata5.00: device reported invalid CHS sector 0 [ 2043.003114] ata5: EH complete [ 2043.003151] sd 4:0:0:0: [sda] Unhandled error code [ 2043.003153] sd 4:0:0:0: [sda]
[ 2043.003154] sd 4:0:0:0: [sda] Unhandled error code [ 2043.003156] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 2043.003158] sd 4:0:0:0: [sda] CDB: [ 2043.003163] sd 4:0:0:0: [sda]
[ 2043.003163] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK [ 2043.003165] sd 4:0:0:0: [sda] CDB: [ 2043.003159] Write(10): 2a 00 [ 2043.003166] Write(10): 2a 00 0e a9 70 c8 00 00 08 00 [ 2043.003176] end_request: I/O error, dev sda, sector 245985480 [ 2043.003179] EXT4-fs warning (device sda8): ext4_end_bio:317: I/O error writing to inode 916782 (offset 0 size 4096 starting block 30748186) [ 2043.003183] Buffer I/O error on device sda8, logical block 3761689 [ 2043.003180] 0f 22 6f e0 00 00 08 00