raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
10.87k stars 4.89k forks source link

Under-voltage detected! (0x00050005) spams dmesg on new kernel 4.14.30-v7+ #2512

Closed E3V3A closed 6 years ago

E3V3A commented 6 years ago

After upgrade of kernel to 4.14. dmesg is now spammed by Under-voltage detected! (0x00050005) messages where no problem was shown previously. I've ran this device non-stop for months, without any problem until after update, so under-voltage level settings or other config must have changed. Spamming dmesg or journlctl --system buffer certainly is not helping anyone.

kern  :crit  : [ 1701.464833 <    2.116656>] Under-voltage detected! (0x00050005)
kern  :info  : [ 1707.668180 <    6.203347>] Voltage normalised (0x00000000)

Also related to #2367

JamesH65 commented 6 years ago

The PR doesn't use vanilla rate logging - it's uses it own interval (5 minutes) and burst (3) so is unaffected by the kernel rate logging settings (although it does use the same rate logging code)

It's been merged, I doubt I'll have time to make any more changes., Many more important things to do. We will consider PR's from elsewhere however.

E3V3A commented 6 years ago

I'm trying to better understand the PR code raspberrypi.c and how it interacts with the kernel, sysfs and logs, but I can't find where it goes. I can't see any trace of this code in the kernel.img nor in any of the modules, or libraries, even if:

# cat /lib/modules/4.14.34-v7+/modules.builtin |grep rasp
kernel/drivers/firmware/raspberrypi.ko

However, this file or its code is nowhere to be found. So where is it hiding?

pelwell commented 6 years ago

The overlay/parameter is trivial to add. Extending the driver to support it is slightly more work, but not much. The reason it hasn't been done is that we aren't convinced it is a sensible change - I think it sends the wrong message, that the problem is managing the warnings, when in fact the problem is an inadequate power delivery system.

notro commented 6 years ago

Feature tweaking like this is often done using a module parameter. Device Tree is primarily used to describe the hardware.

Maybe something like this:

static bool rpi_firmware_uvlog = true;
module_param_named(uvlog, rpi_firmware_uvlog, bool, 0600);
MODULE_PARM_DESC(uvlog, "Enable logging of Under-voltage [default=true]");

My take on whether it should be easy or not to disable safety guards is: let people walk a tightrope across Niagara Falls if they want to. We should do our best to inform of the dangers involved though. Maybe a fat warning in probe():

    if (!rpi_firmware_ovlog)
        pr_warning("Under-voltage logging has been disabled. This is not recommended etc. etc.\n")

If we block actions like this we may find ourselves standing in the way of the hacker/maker spirit and I think that would be sad.

Edit: ovlog -> uvlog

JamesH65 commented 6 years ago

I think that disabling safety guards should ALWAYS be difficult. Otherwise people will do it, and that will cause us more problems than we want to deal with. For example, you need all sorts of licences to be able to tightrope across Niagara falls. People can still do it, its just a PITA to arrange. Safety software is in the same boat. People can do it - all the source is available to change to your hearts content, but I do not believe we should make it easy.

JamesH65 commented 6 years ago

Here's how it would go.

Random user gets a voltage warning. Googles See's that he needs a better power supply. Can't get one straight away, so finds out how to ignore the message Sets a easy to change module parameter to ignore the warning. 2 weeks later, SD card corrupts, some random USB error happens, or some other power related issue Posts on forum. Doesn't mention he disabled warnings. Much time spent by people trying to find the problem that should have been obvious from the outset.

notro commented 6 years ago

@JamesH65: My Niagara falls tightrope analogy turned out to be best used for why it should be hard to circumvent the safety checks :-)

As for how difficult it should be to circumvent, I have failed to factor in how easy it is to compile a kernel these days. Howto information is readily available and it's quite fast to do it on the Pi itself, compared to the overnight job it once was.

As for you're troubleshooting on the forum argument, it didn't cross my mind that it could be a problem, but I only scan a few forums and don't engage in helping people with SD card corruption issues so I really wouldn't know.

I see that there's a new Raspbian out which has this under-voltage logging, which probably means that the Debian kernel package has been updated too. I will be interesting to see how this pans out over the next weeks.

It wasn't that long ago that I learned that the power cable gauge plays a factor in this, not only the power supply rating. A note about this in the Power supply section in the Documentation would be good I think.

E3V3A commented 6 years ago

It's always very enlightening to see how we are so different in this DIY hack-your-own-device philosophy.

AFAICR RPi was based on the ideology of making cool HW easily accessible to the general public, including kids. So just as a kid with a hammer, knife or fire will learn early on how easily it is to destroy something, or get burnt, that should not prevent us from allowing kids to use those most basic tools of life. Or making it harder for them to use and learn about them. So IMO and in this particular case, I simply don't see how the cmdline option obtained from proper documentation (with all above and beyond warnings) would possibly make this worse, than for people trying to force feed extra voltage to their RPis, using for example the far more dangerous, USB back-powering method, or double feeding from different sources. Not to mention how easy it is to abuse the GPIO's. Thus I find the above arguments for "making it more difficult" to implement, as exceptionally lame.

As a side note, for whoever happen to come across this thread. I just added the boot config.txt option: avoid_warnings=2 and my god, finally all that kernel/dmesg garbage is gone! In addition it seem that the device is running smoother. Yes, it is throttled to 600 MHz, which I guess is by the firmware, but already running better. I still have to do some proper performance tests, but I really do think there is a performance hit, when those messages are enabled. The IO reaction just seem more jumpy and laggy while the kernel logs are spammed. (NB. I am still on 4.4.14.30 and not yet on the 4.14.34, where there were some sysfs and log fixes.) What is mysterious though, is why vcgencmd get_throttle is returning 0x0, when clearly the device is throttled. -- [EDIT] That option turn off the throttling too, so the normal ondemand kernel (?) CPU governor is working as it should.

And then of course we have the highly entertaining car analogy. Today all cars are using the CAN BUS and most (even very old ones) have ODB2 access that can be used for all sorts of diagnostics, including to disable various warning lights. You can use your own $12 ODB2 BT dongle and disable any warning with your own phone. And anyone who has had an Audi, VW or BMW also know that some of those engine warning lights come on for absolutely no other reason than annoyance, in order to ask the owner to take the car to their own service centers for checkup after some X miles and force you to pump in extra $$$ for the vendors. (A strategy very similar to having to buy the RPi foundation's magic 5.4V/2.5A power supply.)

ThomasKaiser commented 6 years ago

I really do think there is a performance hit, when those messages are enabled. The IO reaction just seem more jumpy and laggy while the kernel logs are spammed.

So not only you try to power your Pi the worst way possible but also run off an SD card from hell? :)

The ext4 standard commit interval is 5 seconds. So when you really see your system lagging caused by some laughable disk activity every 5 seconds you should seriously consider replacing your SD card. Random IO performance is important if you suffer from such issues, the vast majority of SD cards pretty much sucks here which is why it's important to only buy SD cards that are A1 or A2 compliant any more (last post of this thread contains numbers for SanDisk A1 cards). These perform magnitudes higher compared to average SD cards. Random IO with small block sizes (writing some log contents) can be 100 to 500 times faster.

But given how you try to not improve your underpowering situation most probably you're only interested in masquerading this other problem too? Adding commit=600 to /etc/fstab will do the job.

If you're interested in diagnosing the problem:

sudo apt install sysstat
sudo iostat 10

(watch for the %iowait percentage since this tells you how much your whole system being stuck in IO)

E3V3A commented 6 years ago

FYI: This is a copy/paste excerpt from the USB 2.0 specifications:

The power source and sink requirements of different device classes can be simplified with the introduction of the concept of a unit load. A unit load is defined to be 100 mA. The number of unit loads a device can draw is an absolute maximum, not an average over time. A device may be either low-power at one unit load or high-power, consuming up to five unit loads. All devices default to low-power. The transition to high-power is under software control. It is the responsibility of software to ensure adequate power is available before allowing devices to consume high-power.


My Measurements:

# Voltage across GPIO pins 4 & 6
Under no load:      4.86 V
Under CPU load:     4.46 V

# Voltage @ PSU:    
Under no load:      5.30 V @ ~300 mA
Under CPU load:     5.40 V @ ~950 mA  <-- I have a good PSU!

# Voltage with no load:
@ PP1/2 : 4.92 V
@ PP35  : 4.89 V
@ PP7   : 4.86 V

# Voltage with CPU load:
@ PP1/2 : 4.64 V
@ PP35  : 4.60 V
@ PP7   : 4.58 V

NOTE:
All tests was based with the following connected USB peripherals:

Bus 001 Device 005: ID 0d8c:000c C-Media Electronics, Inc. Audio Adapter    # USB Sound Card
Bus 001 Device 004: ID 05af:0906 Jing-Mold Enterprise Co., Ltd              # Wireless Keyboard

CPU stress load was performed with:
for ((i=0; i<$(nproc --all); i++)); do nice yes >/dev/null & done

Now this indicate that either I have a really shitty cable/connection (at the RPi end) or that there is something else wrong internally. It also explains the under-voltage warning ping-pong effect, because it is so close to the 4.63 V threshold.


And my "SD card from hell" is doing just fine reading 20MB/s and writing ~8 MB/s... without any SD card reader performance hacks.

ThomasKaiser commented 6 years ago

And my "SD card from hell" is doing just fine reading 20MB/s and writing ~8 MB/s... without any SD card reader performance hacks.

You never click on URLs and don't follow suggestions, right? :)

You are talking about sequential performance which is 99% irrelevant with SBC (they matter with digital cameras and video recorders and such 'streaming' use cases). What's really important with SBC is random IO and here SD cards that show laughable ~8MB/s sequential writes are usually slow as hell with random IO. We've seen such cards being as slow as 2 IOPS (IO operations per second) with 16K access patterns. While good A1 rated cards are 250 to 500 times faster! It's all about IOPS and MB/s are somewhat irrelevant.

https://forum.armbian.com/topic/954-sd-card-performance/?page=3&tab=comments#comment-49811

And also again: Use iostat 10 in parallel and watch the %iowait percentage and the amount of data written. If this is constantly high your SD card needs a replacement.

It makes me really sad to see you behaving that ignorant and even actively promoting such weird ideas as setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update since as can be clearly seen it's a horrible idea increasing support efforts for no reasons...

jakemagee commented 6 years ago

So... it sounds like you found your issue @E3V3A

Now this indicate that either I have a really shitty cable/connection (at the RPi end) or that there is something else wrong internally. It also explains the under-voltage warning ping-pong effect, because it is so close to the 4.63 V threshold.

Do you have different types of cables to test with? Do you have other RPi boards to test with?

JamesH65 commented 6 years ago

It makes me really sad to see you behaving that ignorant and even actively promoting such weird ideas as setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update since as can be clearly seen it's a horrible idea increasing support efforts for no reasons...

Had a quick look at the code that deals with "avoid_warnings", it's all a bit odd and difficult to follow, but I suspect you are right, that should not stop logging of low voltage issues. What it should do is stop display of the warnings (lightning bolt), but still do the logging. We'll discuss in house.

ThomasKaiser commented 6 years ago

Now this indicate that either I have a really shitty cable

You simply described the average Micro USB cable out there. They were never intended to carry more than 500 mA which pretty much describes why powering through Micro USB is such a mess if users do not spend the extra money on an extra quality PSU with fixed cable showing low resistance (PSUs with fixed cable have to provide the advertised voltage at the connector side so cable resistance is already taken into account. This makes a huge difference compared to the situation with USB PSU and separate Micro USB cable)

I know you don't visit links so as an embedded table: usb-cable-voltage-drop

The following link provides in a hopefully understandable way the voltage drop situation/challenge with average Micro USB cables (usually having power lines with 26 or even 28 AWG rating): https://www.cnx-software.com/2017/04/27/selecting-a-micro-usb-cable-to-power-development-boards-or-charge-phones/

E3V3A commented 6 years ago

@ThomasKaiser

First I would like to complement you, for your large effort in trying to help and convince us to all and above. Although, I am often annoyed by your suggestions, I really appreciate them! Perhaps, just because you are able to argue with decent proofs even if you are clearly disagreeing with most of my own ideology. :) Also, thanks for that cable table.

You never click on URLs and don't follow suggestions, right?

Blindly clicking on URL's doesn't mean I do not follow or take well founded advice and suggestions. I'm very well aware of the poor (and fake) SD cards out there, and have been for a long time. However, my everyday use very rarely need that high performance as you suggest. The everyday way I use my RPi is simply not requiring that type of high IO R/W speeds. So until I need better performance or start seeing serious errors, I'll keep using what I have. If you wanna send me a 100 EUR SD card, please do so.

...setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update.

This must be the most idiotic suggestion I have seen from you so far. It goes against all unwritten rules of -- and fundamental nature of DIY projects. If people wanna run their HW (or cars) until they brake, they should be able to do so without someone like you pointing fingers at them for being "foolish". If you guys had spent even a fraction of the time actually fixing bugs and issues, instead of arguing against them and the people reporting them, we would be far better off, and not with serial bombardments of broken updates.

As I said above, thousands of people use their Pi's for all sort of small projects, and keep them running that way, until they get this great idea to update, just to find out all hell break loose. Close to every time! People I talk to back in the MagicMirror community are all agreeing on one thing:
If your MM is working, never, ever update the firmware or kernel!

The bottom line. For someone who's running the PiHole or some other display app it doesn't matter very much. But when someone has integrated a multitude of peripherals such as face recognition, voice recognition, Cloud API, external sensors like PIR, Ultrasonics, light detection, IR remote, external GPIO controls, and various USB devices, like SDRs, all on the same device, then your broken updates are not so fun anymore. You do it once or twice, then you say "I will never update this again.".

@JamesH65

I suspect you are right, that should not stop logging of low voltage issues. What it should do is stop display of the warnings (lightning bolt), but still do the logging.

Your reasoning is very scary, especially considering you are part of the RPi foundation! Will you sleep better at night if you know that my SD card is getting filled up by repeated logs, all by my own choice? I think you have become very biased and I just don't see or hear any valid reasoning for doing what TK and you are suggesting. What exactly do you hope to gain by doing this?

Here is a free money making suggestion from me:
In your next iteration of any future Raspberry Pi N (4?) you should make sure to:

  1. Use 3A USB-C connectors
  2. Conform to the USB standards
  3. Include a low loss connector cable
  4. Include the magic power supply if the device doesn't comply with (2).
  5. Include the best suitable SD card
  6. Include a working built in soundcard with MIC/AUX

Each one of those alone, will save you tons of time and money since you will be able to put all your current efforts into sales, marketing, development and production, instead of arguing.

Just let us all know when this will happen, because that will be the day I will stop updating and buying your HW, permanently.


@jakemagee

Do you have different types of cables to test with?

Yes, in fact, I just tried today with another cable, but the improvement was minimal but clearly noticeable.

# Voltage with no load:
@ PP1/2 : 5.00 V
@ PP35  : 4.98 V
@ PP7   : 4.93 V

# Voltage with CPU load:
@ PP1/2 : 4.68 V
@ PP35  : 4.64 V
@ PP7   : 4.58 V
ThomasKaiser commented 6 years ago

If people wanna run their HW (or cars) until they brake, they should be able to do so without someone like you pointing fingers at them

If those people would not open up support issues and believing they have to blame software for their hardware issues everything would be fine. But they do and report the various unnecessary results of their underpowering adventures as 'bugs' wasting their own and other's time.

I'm contributing to an open source project dealing with all sorts of SBC (except Raspberries). The vast majority of 'software issues' people have are in reality

Being able to differentiate between those hardware issues and real issues is essential if you want to spend time on software issues and not just dealing with ignorance (as yours -- I really can't believe you're still refusing to fix your underpowering situation). So now that undervoltage logging is in place it would be fatal if users can masquerade this since as we can see from you they're even encouraged to do so...

Since you seem to love using inappropriate hardware (be it powering or storage) I already recommended adding commit=600 to /etc/fstab mitigate crappy random IO performance of your SD card (seriously: if a few log lines every few seconds result in a laggy system or you fear logging in general this is a great idea also drastically reducing wear on flash media -- 'my' distro for this purpose implements log2ram writing log contents back to 'disk' just every hour by default)

jacobq commented 6 years ago

I don't want to get mixed-up in this very long winded discussion, but FWIW I will say that I stumbled across it looking for a way to suppress kernel messages from the console (in my experience this has made bad problems worse as I'm trying to triage things and shutdown but get messages printed right over files in my editor, etc.) and there are some ways to do this, such as dmesg -n 1 see https://superuser.com/questions/351387/how-to-stop-kernel-messages-from-flooding-my-console#answer-351402 A previous comment suggested that this does not work, but it seemed to work fine for my purposes (i.e. on RPi 3 B+ it stopped kernel messages from getting printed to my console though they still appear in the output of dmesg)

advcron commented 6 years ago

I had the same problem. I used 5V 2A charger and dmesg flooded

Under-voltage detected! (0x00050005) Voltage normalised (0x00000000)

So I bought convertert Module DC-DC (I am lowering voltage from 12V to 5V) hy196_0815. hy196_0815 The errors didn't disapeard. Next I change mircorusb cabel (From huawei P9 lite) and I think that was it. RPI 3 B+ running until now 12 hours, and errors not appeard in dmesg. So good cable is very important.

JamesH65 commented 6 years ago
Your reasoning is very scary, especially considering you are part of the RPi foundation! Will you sleep better at night if you know that my SD card is getting filled up by repeated logs, all by my own choice? I think you have become very biased and I just don't see or hear any valid reasoning for doing what TK and you are suggesting. What exactly do you hope to gain by doing this?

Here is a free money making suggestion from me:
In your next iteration of any future Raspberry Pi N (4?) you should make sure to:

Use 3A USB-C connectors
Conform to the USB standards
Include a low loss connector cable
Include the magic power supply if the device doesn't comply with (2).
Include the best suitable SD card
Include a working built in soundcard with MIC/AUX
Each one of those alone, will save you tons of time and money since you will be able to put all your current efforts into sales, marketing, development and production, instead of arguing.

Just let us all know when this will happen, because that will be the day I will stop updating and buying your HW, permanently.

Thanks for the advice, to a Foundation that's sold 19M devices, completely changed the SBC market, provided millions of pounds to education from the profits. I'm sure you advice will completely change our business. I've given pointers where your advice would not actually make more money below.

The problem here, is that you have an opinion and are unwilling to accept that opinion is wrong. Which in my opinion, it is. Not scary, just an opinion that is different to yours. You are also unwilling to fix your perennial under voltage problem, which would make all the messages go away, for reasons which are still unclear.

We have now added rate logging to the messages. This limits to three messages every 5 minutes. If your log is STILL filling up with messages FIX THE DAMN POWER SUPPLY! IT IS INADEQUATE! I cannot understand what is so difficult to understand about that.

Points in reply to some of the above, without actually giving away what the Pi4 will actually have on it.

USB standards. The SoC has a inbuilt USB device, which sadly is a bit crap but we cannot do anything about that, but with the ARM FiQ we have made it work pretty well. The hub chip, which also provides the ethernet, is USB compliant.

There is no point is providing SD card, power supply, cable etc, because then every purchaser of a Pi would also get a full set every time, and not everyone wants that. It would also make the headline price too expensive. Those are all terrible ideas. You can buy kits with all those in though.....

The SOC also contains a decent sound system that output via the HDMI which for the vast majority of people is fine. There is again, no point in making the Pi more expensive for everyone, with a feature that only a few people use. You fallen in to the same trap as your thinking on the logging - thinking your use case is the important one, where in fact you are just one of many millions of users. The needs of the many outweigh the needs of the few.

As for saving money - the point of logging in dmesg for low power is there for exactly that - to reduce support issues!

As for money for marketing, development, production - we have plenty.

E3V3A commented 6 years ago

So I'm driving down the road in my 2 year old BMW. It is far from their top-of-the-line performance models, but it is neither the low-end one. It's a great car and and it's been taking me back and forth to work, to weekend outings, and occasional long tours, for 2 years. Since, I do not intend to take it for top-of-the speed daily rushes across Europe on autobahn, it has served me perfect and done a great job so far.

However, one day the service light come on, indicating that it is time to take the car in for the 1000 mile service checkup. So since the winter it just about to roll around and I decide to take it in for pre-autumn service. I do so and the mechanic says all is in order, but has updated the system and reset service indicator. All good.

Since the roads up here can get icy, I decide to put back my winter tires I used from last year, and after I do that, suddenly the critical Engine emergency warning light come on! I'm confused, since I just got it back from service! I call up the mechanic and tell him about my problem. He ask if I made any changes since visiting him last week. I said, not really, just put on my winter tires I've used last year. He says "Oh! What kind are they?", so I answer, "Well, I found these standard Bridegstone Winter tires, and decided I did not need to spend the extra money on those custom BWM High-performance winter tires you sell." He answers: "Oh, that explains it! You are using shitty tires, and we just implemented a detection and warning system into the car, that detect if you are not using our custom tires." So I say, "Ah, ok. Thank you for explaining the situation, but I'll keep the tires I already have for another season, since they worked just fine last year."

The next day, I'm driving down the highway, trying to overpass a slower truck. I speed up to 1200 Hm/h [1200 Hecto-meters = 120 Km], and as I'm just next to the truck, suddenly my engine stops accelerating and the engine and the car suddenly put me at grinding halt of 600 Hm/h. At the same exact instant the inside compartment light start blinking every few seconds, almost blinding me in the autumn dusk, while the critical engine light is doing the same. I almost have a head-on collision because of this episode, and decide to call my old colleague Mr.Ferrari, and explain it to him.

The next day I go to his garage where he has a an ODB2 analyzer. He explains that it is not the car itself, nor the tires that are the problem, but that it is the new update to the cars software. But that he can disable it. So I decide to disable these dangerous (and now useless) warnings about my non-OEM tires. I go riding off happily into the sunset...or so I though!

Then I drive in to a cloud of mosquitoes, and my windshield is gored up. I turn on the vipers, only to find that one of the wiper blades is a bit worn out. I make a mental note to myself to fix it next time I have a chance. But before I even get to the end of that thought. Suddenly out of the back-seat mid compartment a jack-in-the-box pop out like a spring loaded bullet, and screams "YOU HAVE SHIT TIRES!! REPLACE THEM AND LISTEN TO ME!" in my right ear. I veer off the road into a dirty field and decide that putting my money into such a company is not worth the tires it stand on. I hitchhike back to reality of the super competitive embedded universe of IoT and find myself a Tesla, that can use any available tires, power sources and can run at any current or on any road at any speed you desire.

Peace at last!


The above excerpt was based on a true story, and will be written in the forthcoming book, "How big little companies become greedy and cocky, and then gets replaced."

JamesH65 commented 6 years ago

Cocky? Or just correct? I'm out.