pop-os / system76-dkms

System76 DKMS driver
GNU General Public License v2.0
37 stars 20 forks source link

Fan control may cause EC hangs #11

Closed jackpot51 closed 5 years ago

jackpot51 commented 5 years ago

Several oryp4 customers have reported unresponsive power buttons, likely due to the EC hanging. This may involve the use of fancontrol, as was the case in one customer's unresponsive system.

We should investigate this issue and potentially fix or disable the hwmon interface for fans.

ebobby commented 5 years ago

That customer was me. Let me know if I can do more to help out.

jackpot51 commented 5 years ago

Thanks @ebobby, I am sorry that this happened to you. I once hung the EC by flashing firmware from a different model and had to pull the battery - not a fun thing to do when the system is running!

I would like to duplicate your setup to get to the bottom of this as soon as possible. Were you on Arch Linux when this happened?

nicholi commented 5 years ago

Just to add my info to the pile, I started the thread on the system76 subreddit.

I'm running a completely stock system76 installed Ubuntu 18.04. Have not even begun to tweak much of anything in terms of fan control/kernels/etc. I've had my Oryx Pro for about a month with no other issues. When the button did stop I was NOT doing anything resource intensive. I had a few terminals open which was what the screen locked on when it happened. At the exact moment it froze I was holding down the left arrow key. Also I do believe at the time I was blocking both the rear fans and the ones on the bottom, not really sure to what extent or for how long.

I was curious if maybe my firmware was behind but already on the latest.

BIOS: 7.006S76 EC: 1.07.05

I'll consult this issue if it happens again. Been ok so far.

jackpot51 commented 5 years ago

Thanks @Nicholi

drNoob13 commented 5 years ago

Could you try to regenerate the issue on Ubuntu 16.04?

nicholi commented 5 years ago

I'm not gonna cover any of my vents on purpose again, so no.

I do terminal work all the time, I'll report back if it happens again.

ebobby commented 5 years ago

@jackpot51 Yes. Using Arch. I can give you a list of packages I have installed and my config files (dotfiles and system wide) are public in my github.

ludvigsen commented 5 years ago

I also experienced this issue on my oryx pro. Also running Arch. Is there any packages that might be responsible?

codyacarey commented 5 years ago

Just started experiencing the same issues as others. Had a number of applications open, but was actively working in Firefox. Keyboard and trackpad suddenly became unresponsive, but Bluetooth mouse continued to function. Restarted to see if that cleared the issue. Laptop is presently stuck with a black screen, power button light is on, one fan on the side with the display ports is on at low speed, and the two left front indicator LEDs are active; power adapter and battery. Power adapter LED stays on even with the cable unplugged. Average temperature taken with infrared thermometer of the bottom of the machine is between 95 and 102 degrees Fahrenheit. I'm allowing the machine to sit and power off on its own rather than unplugging the battery. I'm using a bunch of USB devices (phones, fans, etc.) to drain more quickly. Operating system was Pop OS/Ubuntu 18.04, updated daily. When the machine is back on I'll post firmware versions. Also opening a support ticket online.

colinfaulkingham commented 5 years ago

I just had this happen today on my Oryx Pro. I was running multiple app's when it suddenly became unresponsive and really hot in the top left corner. I was able to plugin in a USB keyboard and reboot; but It froze on on the Ubuntu splash screen. I then had to let the battery drain and let it cool before I could start it back up again. I called support and left a message,when I restated my machine I had a thermal update from System 76. I hope they're pushing a fix, if not. I will probably have to send it back for a refund.

benjarmstrong commented 5 years ago

Does this mean I can swap in my malfunctioning machine for a different model?

I'm running a completely stock system76 installed Ubuntu 18.04. Have not even begun to tweak much of anything in terms of fan control/kernels/etc. I've had my Oryx Pro for about a month with no other issues. When the button did stop I was NOT doing anything resource intensive. I had a few terminals open which was what the screen locked on when it happened. At the exact moment it froze I was holding down the left arrow key. Also I do believe at the time I was blocking both the rear fans and the ones on the bottom, not really sure to what extent or for how long.

It's interesting that you should mention you had the left key held when this happened. When my Oryx Pro did this for the first time it was behaving like the left key was held down as well, as if that were the moment it stopped receiving new input.

DropsOfSerenity commented 5 years ago

I'm having this too, contacting support..

kJamesy commented 5 years ago

MeToo! Recently upgraded from 16.04 and since then, it's behaving like a 2GB RAM Fujitsu Siemens Computer. Difference being it set me back some £2000+

My power button works and I can keep rebooting; a minute or two after it boots, everything hangs. Meanwhile, fans are working quite hard and only firefox is open. C'mon, lads!!!!

tonylambiris commented 5 years ago

This took me literally over a month to eventually figure out with the short supply of spare time I have to begin with: https://twitter.com/thelambeers/status/1048193419867316226

ebobby commented 5 years ago

@tonylambiris those hang ups in twitter are not the same referred to here. bbswitch is not really the problem in your case. If you are relying solely on bbswitch to try to turn your nvidia card off it won't be able to, and then other software that uses some acpi calls will lock up. In my case it used to be lspci and neofetch.

The only reliable way to turn the card on and off is using system76-power.

markjfisher commented 5 years ago

Another #MeToo - I'll be contacting support tomorrow.

I was in the middle of writing my own sensors logger to record historical information on the temp/fans, as I have another issue open that the fans are very loud and continuous when using NVidia chipset.

In the last 2 days, this laptop has totally shutdown on me on a whim (small message about temperature causing one CPU to go slow in system logs, nothing else), and now it has frozen and is unresponsive to power switch.

tonylambiris commented 5 years ago

@ebobby all I can tell you is what I unequivocally was able to determine, and blacklisting that module caused no more crashes/freezes and actually allowed me to use bumblebeed.service as advertised.

fbraza commented 5 years ago

Just to add my case here also

I'm running a completely stock system76 installed Pop_Os 18.04. I got the machine last September.

I was not doing something "ressource demanding" as I was just browsing in the web and writing some code in the terminal. Very basic ones. The computer was on charge when this happened.

Right now. The battery got drained and I am going to see if I can reboot it.

Will update soon

EDIT: The computer restarts. Let's see if something similar will happen again.

tonylambiris commented 5 years ago

@markjfisher I'll save you some time, here's the reply I got back before deciding to handle it myself:

Tony,

For lockups I usually start with a memtest or a SMART test for the drive:
https://support.system76.com/articles/hardware-failure/

Spoiler-alert: memory and SSD nvme reported no issues.

davidrhoderick commented 5 years ago

I've experienced this issue twice; definitely seems like the system is working (I could see the Chrome dev console logs pinging and the CPU monitor updating) but trackpad and keyboard (including power button) become unresponsive.

I couldn't move the mouse using a USB mouse but I could log out with a wireless USB keyboard. Then the USB wired mouse became responsive.

I attempted a shut down from the account selector screen and it's hung trying to shut down and completely unresponsive. I was working so this is a real inconvenience and I'm glad I held onto my MacBook Pro as a backup.

Is there no software solution? I live in Serbia so I'm afraid it'll take half a year to send the machine back and get a replacement. A replacement F10 key took several weeks to get here...

aravindbargurhiriyannaiah commented 5 years ago

It just happened to me.

  1. System76 Oryx Pro (1 month old) (running Ubuntu 18.04) froze and the screen went blank (no display)
  2. Power button became useless. Pressed it for upwards of a minute but the system's fan is still on and so are the lights.
  3. Did not want to yank the battery out during power on/fan running as it is still under warranty.

Awaiting the battery to drain - the laptop is useless in its current state. Hoping somebody at System76 can help rectify the problem.

nicholi commented 5 years ago

Welp, it just happened again. Took almost 3 months but I'm at the same spot again. The only difference is I've disabled the Nvidia card, which is what I was hoping was somehow causing the problem. No other crazy kernel updates or custom mods.

I was doing the incredibly complex task of alt tabbing and then, no more input. On the bonus side of new information: I plugged in a usb mouse and low and behold it worked fine. While I was closing out all my work (in an attempt to reboot) I found the down arrow key was apparently being pressed over and over. Told Ubuntu to restart (not shutdown) it fully logged out but now I'm sitting at a black screen. Still no ability to press power key or anything.

My guess some software glitch is causing an insanely spam of keys, which is apparently locking all other input including the power button. My other guess is a restart didn't fully power down the board thus I'm still locked until battery runs out. If I had just gone the shutdown route this MIGHT be the only recourse to get out of this shit storm. With no OS actually running I think I'm in for a long wait.

My suggestion no one should purchase any Oryx Pros or any other system76 machines which have a similar setup with no access to the battery. At least until you see a full write-up for how this has been resolved, let's be real it's a serious flaw in something's design. Unless you are absolutely fine with being locked out of your laptop.

Meloncon commented 5 years ago

Stock Oryx Pro 4 15 inch 4K display 16GB RAM OS: pop!_os

Wanted to report I am having the same problem. I had chrome, skype, text editor, and a couple of terminals open. Nothing too taxing except maybe Chrome. I was editing a file in Vim, autocomplete popped up, hit the down arrow and BAM!, keyboard and trackpad became unresponsive, the autocomplete drop down in Vim continued scrolling down and looping over the options repeatedly. Could not get it to respond, though the computer seemed to keep doing it's things just fine, just couldn't interact with it at all. Wasn't overly hot to the touch. Plugged in external keyboard and mouse. regained control long enough to close everything and issue a restart. The restart showed a lit black screen for awhile and then the screen seemed to turn off and was left with a computer that won't respond to the power button with fans going. It's completely unresponsive. I'm waiting for the battery to drain now. I attempted to take the back panel off to disconnect the battery, but I was having to use a lot of force to prior it up and stopped because I was scared it was going to snap.

So here I am waiting for my laptop to run out of juice. It almost seemed like the input drivers/firmware crashed and left it with no way to respond.

fbraza commented 5 years ago

Hello guys,

Just to update on my situation. I recently updated the firmware and did not get any more issue. Fingers crossed. I judt saw now a new update available. When I contacted the support service (one month ago) they told me that the engineer found the problem and that it was firmware related that soon we should get an update.

You might want doing the update by using the s76-firmware app of pop_OS

Cheers

Faouzi

tonylambiris commented 5 years ago

Can anyone here running Pop!_OS post an output of their /etc/fstab? I'm curious what the default file systems and mount options are (you can obfuscate any sensitive data you need to).

Thanks!

rlabrecque commented 5 years ago

When you guys end up in this state does Fn+1 work still to change the fan speed?

rlabrecque commented 5 years ago

EC 1.07.08a is out now and seems to have fixed my shutdown/sleep related issues. 👍

jackpot51 commented 5 years ago

Yes, EC 1.07.08a should fix these issues.