raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
10.87k stars 4.89k forks source link

Under-voltage detected! (0x00050005) spams dmesg on new kernel 4.14.30-v7+ #2512

Closed E3V3A closed 6 years ago

E3V3A commented 6 years ago

After upgrade of kernel to 4.14. dmesg is now spammed by Under-voltage detected! (0x00050005) messages where no problem was shown previously. I've ran this device non-stop for months, without any problem until after update, so under-voltage level settings or other config must have changed. Spamming dmesg or journlctl --system buffer certainly is not helping anyone.

kern  :crit  : [ 1701.464833 <    2.116656>] Under-voltage detected! (0x00050005)
kern  :info  : [ 1707.668180 <    6.203347>] Voltage normalised (0x00000000)

Also related to #2367

pelwell commented 6 years ago

The kernel under voltage notification is new, but the threshold and detection mechanism is unchanged. You are now being made aware of the fact that your Pi is insufficiently powered for the load placed upon it. This is bad for performance and potentially harmful to system stability.

rodizio1 commented 6 years ago

I can tell from lots of testing and measuring with a multimeter, that 99% of the "typical" USB power supplies (phone chargers, gadget chargers, power banks, etc.) either don't supply the advertised amps, or they drop voltage too much. Or both. Even if the power supply itself delivers stable 5V at 2-2.5amps, often the voltage at the end of the cable has dropped too much because of too thin wires.

E3V3A commented 6 years ago

I'm well aware of the working of the RPi power supply. But the fact of the matter is that:

P33M commented 6 years ago

It didn't happen with the earlier kernel, so what make you think this is an improvement? (The flash icon was good and annoying enough!)

It was happening, you just didn't notice that it was. Headless users previously had no automatic notification that the undervoltage had occurred, relying on manually querying the firmware in order to find out that this was the case. Also, KERN_CRIT is "A critical condition occurred like a serious hardware/software failure" and so is the appropriate log level for a condition that is likely to cause system instability.

JamesH65 commented 6 years ago

Note that the Pi is not guaranteed to work correctly when the power supply at the board is less than 4.63v which (within +-10%) which is the point at which the Icon is displayed, and the new reporting will add a message to the log. Note, that doesn't mean that YOUR particular Pi will stop working, just that the voltage is low enough that there could be issues. If you are getting messages in the log, then the voltage IS dropping below 4.63, and there is a risk of system instability. Just because you haven't actually seen any system stability, does not mean the risk is not there.

I am not sure that there is an option to disable the warning, but tbh, that would be like putting tape over the engine warning light of your car. OK, you cannot see the nasty bright warning light, but you risk the engine blowing up.

E3V3A commented 6 years ago

Come on guys! I'm sure you know what I meant.

It was happening, you just didn't notice that it was.

I noticed every time the bright yellow flash was blinking on my screen, but at the time (before this issue) usually only when processor was under load. Now there is nothing running on it, and the kernel log is flooded for something that I no longer have any control over. That is the problem. And you still have not addressed the issue why it is not possible to adjust that property with the standard sysctl tool that is meant for exactly that.

I have been running this particular configuration for almost 2 years non-stop and without any problems I could not deal with, until this last update. Only, to be told, "dude it's your power supply, get a new one." That surely cannot be your new marketing strategy!? I happen to know a lot about hardware, especially embedded hardware, so I wonder how the rest of the community will feel or respond to this, once they find out.

So clearly this is not remotely anything critical, and should warrant at most one notification in the kernel log (or whichever you seem fit). The point again, is that in its current state, it is spamming any other problem out of existence. So no matter how proud you are having made this improvement, it is simply a poor decision, at least from an ethical standpoint.

One solution could be set the time limit on that error.

asavah commented 6 years ago

@E3V3A

So no matter how proud you are having made this improvement, it is simply a poor decision, at least from an ethical standpoint.

IMHO this is a great decision from any standpoint. Imagine a headless pi on which you'd never see the damn lighting, you'd never know you have power problems until it was too late or unless you run vcgencmd get_throttled, which you wouldn't run unless you already suspect power/thermal problems.

Thanks to this I found on my headless pi2 that its PSU (cheap chinese crap) lost some juice and was no longer able to provide 5v under heavy load (it was OK for ~ 1 year), and I was able to replace it before something bad happened.

IMO this is a great improvement as it enhances kernel<->vc4 communications and provides an alert to the user.

All your babbling about "ethics" makes no sense here.

So clearly this is not remotely anything critical,

AFAIK (at least in server world) power problems are considered critical. There are servers (eg. DELL) which will refuse to boot if power budget is below what BIOS/UEFI calculates is needed. If software alerts about a hardware problem one should fix the damn hw problem ASAP. Period.

The people here are the engineers who design and support the pis, I bet they know what power requirements are ...

If you are not happy with the log spam - learn how to filter specific syslog messages based on message content, yes, you can do that and much more with rsyslog.

ThomasKaiser commented 6 years ago

Now there is nothing running on it, and the kernel log is flooded for something that I no longer have any control over

@E3V3A Your powering sucks and you have to fix this (and your way of thinking). You're affected as so many other RPi users by a phenomenon called voltage drop. Replace the cable between your board and the PSU or get the official RPi PSU and you're done.

You even suffer from instabilities and still don't get it that you have a hardware problem? Just like this guy here: https://github.com/bamarni/pi64/issues/66

PSUs show aging effects too and a voltage drop under load is one of the many symptoms.

E3V3A commented 6 years ago

@asavah

All your babbling about "ethics" makes no sense here.

Yes, you're right. It doesn't belong here at all. It only reflects my frustration of useless answers from what I can only assume are your colleagues.

@ThomasKaiser

It's interesting that you refer to that exact issue, because the guy specifically says:

I use a pi3 with the “official“ 2.5A power supply. No issues with other builds / SD cards.

and then goes on saying that:

While trying various ways to decrease the "stress" on the pi one of my solutions was creating a swap file. This fixed the problem,

So it merely show how you guys love brush off any issues with a general answer:
"Your powering sucks and you have to fix this (and your way of thinking)."

So, yeah, then it makes sense to blink that under-voltage flash every few seconds, because if you do, no matter what issues people have, you can always refer back to that and repeat the sentence above and close the issue. I'd close this issue with the would-be-tag "We know it better and we know more about your PSU and the cable you're using, than you do."

So for future RPi sales, everyone would be much better off if you would just build your magic power supply directly into the device, that way there will never be any more issues and complaints and you could save 1000's of man-hours of work because of all these PSU related issues.

Then Nostradamus, predicted back in the 1500's that a few months from now, there will be storm of new issues regarding SD card failures due to excessive wear and failed SD writes... not to mention the performance overhead for spamming /var/log/.

In conclusion, the only serious solution for me (and you) seem to be to revert to kernel 4.9 and everyone will be happy again.

ThomasKaiser commented 6 years ago

@E3V3A Just to be sure: What is printed on your PSU? 4.63V or something with a 5? If there's a 5 printed do you get that there's something wrong when the device to be powered by this setup reports less than 4.63V already without any load at all? Can you imagine how low voltage will drop with some load applied or some USB peripherals that need also some juice?

Do you think devices that have a power requirement of 5V work properly when you provide only 4V?

ThomasKaiser commented 6 years ago

In conclusion, the only serious solution for me (and you) seem to be to revert to kernel 4.9 and everyone will be happy again.

Simply create /etc/rsyslog.d/ignore-underpowering.conf with :msg, contains, "oltage" ~ and you can enjoy an instable system even with kernel 4.14 :)

BTW: Just found it. There are SBC that allow for constant input voltage monitoring. What you can see here is a PSU that provided 5.25V in the beginning after approximately 1.5 years of constant operation: https://forum.armbian.com/topic/5699-how-to-provide-and-interpret-debug-output/?do=findComment&comment=44210 -- DC-IN dropped as low as 4.2V with some light load (this board has also a good PMIC and a large battery and power circuitry uses boost converters to provide stable voltages to all subsystems, USB and SATA included)

E3V3A commented 6 years ago

@ThomasKaiser I edited the rsyslog.d config files as you mentioned in the default /etc/rsyslog.conf with and without tabs, like this:

:msg, contains, "oltage" ~

Indeed this removes the voltage related logs from the /var/log/*.log files. :+1: But apparently dmesg which is using /dev/kmsg and /proc/kmsg, seem independent of syslogd and rsyslogd settings, and thus still show all under-voltage entries as before with dmesg -e -x. But I guess I can live with that.

Regarding the input voltage, I am surprised that the detector is able to measure the voltage to the second decimal 4.63, but that there is no way to read it from /sys. What is that all about? How and what does the device actually measure when the voltage is lower than that threshold?

Either way I'll report back, once I have the values. In the process of all this investigation I've unfortunately found a wide range of other unpleasant surprises coming from this update. All sorts of things, like overwriting ALSA configurations, starting services that was never ran before, automatically running apt upgrade, etc. :(

pelwell commented 6 years ago

Regarding the input voltage, I am surprised that the detector is able to measure the voltage to the second decimal 4.63, but that there is no way to read it from /sys. What is that all about? How and what does the device actually measure when the voltage is lower than that threshold?

It's a hard-wired threshold, implemented by the new PMIC on the 3B+ and using discrete components on older boards - we only know which side of the threshold the voltage is.

JamesH65 commented 6 years ago

With regard to you other comments on the 4.14 update, it's quite a big move from 4.9, so I would expect some fairly obvious changes. Also note that the huge majority of changes are from the upstream kernel, not Raspberry Pi. However, automatically running apt update makes no sense. That should never happen by default, and I've certainly not seen it in any testing (we've had 4.14 in test for quite a few months or so).

E3V3A commented 6 years ago

However, automatically running apt update makes no sense.

Nope.

# cat /etc/cron.daily/apt-compat
...
exec /usr/lib/apt/apt.systemd.daily

# Then in:
# cat /usr/lib/apt/apt.systemd.daily

#!/bin/sh
#set -e
#
# This file understands the following apt configuration variables:
# Values here are the default.
# Create /etc/apt/apt.conf.d/10periodic file to set your preference.
#
...
#
#  APT::Periodic::Enable "1";
#  - Enable the update/upgrade script (0=disable)
...
#  APT::Periodic::Download-Upgradeable-Packages-Debdelta "1";
#  - Use debdelta-upgrade to download updates if available (0=disable)
...

You can see it here:

# Check for APT services:
# systemctl --all |grep apt-

apt-daily-upgrade.service   loaded    inactive dead      Daily apt upgrade and clean activities
apt-daily.service           loaded    inactive dead      Daily apt download activities
apt-daily-upgrade.timer     loaded    active   waiting   Daily apt upgrade and clean activities                              
apt-daily.timer             loaded    active   waiting   Daily apt download activities

So it's possible it doesn't do anything, but it is still running everyday. I found this by looking in the /var/log/daemon.log:

systemd[1]: Starting Daily apt upgrade and clean activities...
systemd[1]: Started Daily apt upgrade and clean activities.
systemd[1]: apt-daily-upgrade.timer: Adding 28min 11.764106s random time.
systemd[1]: apt-daily-upgrade.timer: Adding 19min 6.283733s random time.
systemd[1]: Stopped Daily apt upgrade and clean activities.
systemd[1]: Stopped Daily apt download activities.

I have not investigated further...

ThomasKaiser commented 6 years ago

Indeed this removes the voltage related logs from the /var/log/*.log files.

I can't believe that you really did this instead of fixing the problem. Are you aware that you turned your Pi into a 600 MHz device by ignoring your under-voltage issues? You're running frequency capped all the time and based on your description your PSU will most probably die soon anyway (since what's the reason for under-voltage now occuring even with no load at all?)

E3V3A commented 6 years ago

@ThomasKaiser

I can't believe that you really did this instead of fixing the problem.

There was no problem until I updated with this kernel!

So yeah, perhaps my power supply is not ideal and crappy, but the fact of the matter is that it was running on full speed, on medium load and everything else was working more or less fine before your kernel push. I still can't believe you pushed out that crappy Kernel update before proper testing or getting more community feedback. (Now I already have another kernel update waiting.) I've already spent days trying to repair and fix all the bloat and issues that resulted from this, and still seem to have a long way to go. In fact, at this point I would just like to downgrade! Unfortunately I don't see an easy way to do this, at this point. So thanks a lot.


And what make you think that this setup is so much more reliable?
Last time I checked, capacitors are both unreliable and not very precise, unless you put military grade (Radio Shack ;) caps in there.

schematic 1

So if it is true that you are using the APX803-46, then there is a range of V_th of: 4.56 4.63 4.70. This is apparently a well known issue and well documented here. There they propose that you should have used the APX803-44 instead, with a range: 4.31 4.38 4.45, and nobody would have had any problems! One of the main problems with your design, is stated like this:

The power input circuit design is outside of the bounds of what we can control. This design forces businesses to create and customers to purchase power supplies that are out of compliance with industry standards. The reason some other power adapters do not experience this issue is because they provide dangerously high voltages that are not standards complaint. In our tests of this issue, we found power supplies delivering up to 5.7Vopen and 5.5V with an 0.5A load. These may fry sensitive USB electronics that do not have any protection built-in.

So, now please spare us all the PSU excuses, and revert the kernel & firmware to be a little more accepting.

One way you could do this, is by using a broader time constant for the under voltage. I.e. average the voltage for a minute or something.

E3V3A commented 6 years ago

And yes, I have mentioned it before, elsewhere. Please provide a proper CHANGELOG to your kernel releases, so people don't have to fall into this trap. Being able to use apt-get changelog raspberrypi-kernel would have been great, but I was told as an excuse that it is not maintained by you. But then you could always document it elsewhere... GitHub has Wiki pages you know!

ThomasKaiser commented 6 years ago

So yeah, perhaps my power supply is not ideal and crappy, but the fact of the matter is that it was running on full speed

Nope. It seems you're relying on 'Linux standards' like

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

which you can't on the Raspberry Pi since containing bogus values. When you're running undervolted the firmware caps frequency to 600 MHz while cpuinfo_cur_freq is telling you only an irrelevant number (900 MHz on RPi 2, 1200 MHz on RPI 3 and 1400 MHz on RPi 3+). Only way to check for the problem is currently

vcgencmd measure_clock arm | awk -F"=" '{printf ("%0.0f",$2/1000000); }'

Of course you're not alone. Especially headless users don't get it that they run at 600 MHz max all the time :) See for example https://github.com/bamarni/pi64/issues/4#issuecomment-291425512

E3V3A commented 6 years ago

cpuinfo_cur_freq is the HW clock, (not from kernel) and seem to give the same result as vcgencmd measure_clock.

ThomasKaiser commented 6 years ago

Is the result of the following 600 or 1200?

sysbench --test=cpu --cpu-max-prime=5000 --num-threads=4 run && vcgencmd measure_clock arm | awk -F"=" '{printf ("%0.0f",$2/1000000); }'
JamesH65 commented 6 years ago

There was no problem until I updated with your kernel!

Yes there was, your Pi was undervolted. The fact there were not outward signs of corruption etc doesn't obviate that fact. All the kernel is doing different is REPORTING a problem. A problem that may always have been there, previously unnoticed.

Just a FYI, TKaiser is not an employee of the RPF, its not 'his' kernel. He is a well informed community member trying to help you. I'm the only employee currently commenting on this thread.

E3V3A commented 6 years ago

@JamesH65

Yes there was, your Pi was undervolted.

I'm not arguing that there was not an under-voltage problem before, but I am arguing that I had no other problems at all before this update from 4.9.80, except the very occasional (once every few days) issue #2510, that I posted and which still hasn't been addressed even if it seem to be from several years back.

There was an under-voltage problem right before, but it was under load and showing every few minutes at most. Now it is showing every few seconds under no other load than that provided by the update itself, and with USB sound-card disconnected. I am also arguing that when I last checked CPU frequency, probably >6 months ago it was running at 1200. So it's totally irrelevant for this issue, to repeatedly ask me to post the current speed, since the OP is already stating that I am being throttled.

So, yes, I'm also trying to help, by reporting my findings here. But, I am now quite fed up by these discussions about PSU. It's a horribly expensive scape-goat, no matter how you look at it. You made a design error, and we have to live with it. We still love our RPi3s, and when it works, it works. Mine was working, until this update.

My biggest mistake was not to first check all the issues here, before blindly updating. (Because it went well before.)

pelwell commented 6 years ago

Denial, as they say, isn't just a river in Egypt.

You seem like an intelligent, knowledgeable person, yet you can't get over the fact that everybody else on this post is actually trying to help you - consider it an intervention. By under-powering your Pi you are limiting its performance and risking corruption on a daily basis. @ThomasKaiser is attempting to tell you that the frequency throttling is managed by the firmware without the knowledge of the CPU governor, so unless you use vcgencmd measure_freq arm you aren't getting an accurate picture of the ARMs clock speed.

You have raised a valid point that the kernel messages are too frequent, and we are preparing a patch to limit the rate - an initial message immediately when it first happens (and after a long gap), then periodic digests with a count would be nice - but other than that we have no plan to change this mechanism because we consider it an important service to our users.

ThomasKaiser commented 6 years ago

with USB sound-card disconnected

So this 'sound-card' is powered by the Pi or why do you mention this? Do you know how low voltage is allowed to drop for USB peripherals according to specs for this type of device? 4.4V or 4.75V?

ThomasKaiser commented 6 years ago

unless you use vcgencmd measure_freq arm you aren't getting an accurate picture of the ARMs clock speed.

@pelwell any chance to fix this too and to get the kernel reporting the real clockspeeds in any case?

JamesH65 commented 6 years ago

As far as we are concerned there is no design error in the power circuitry. You are quoting a third party supplier of power supplies who are claiming there is. Of course, they are selling a competitor power supply so are clearly biased!

Using a Raspberry Pi power supply shows NO issue with power supply in any of the circumstances these people have claimed are issues. However, we will be buying their power supply to test, because we do like to be thorough. Might take a while as they do not ship to the UK.

E3V3A commented 6 years ago

@pelwell

the frequency throttling is managed by the firmware without the knowledge of the CPU governor,

I could not have imagined this being the case. Thanks for clarifying. BTW. Why was it decided to be done this way?

@ThomasKaiser

why do you mention this?

Just FYI, connected or not, the approximate time constant between under-voltage errors are now ~2 minutes, independent of USB sound-card presence. This is a slight improvement from yesterdays ~30 sec. before removing already mentioned service bloat.


I would like to make an accurate measurement on the board for my voltages. Can you point me where on the PCB to do the measurements? From the schematics above, there seem to be several test points.

JamesH65 commented 6 years ago

Why was it decided to be done this way?

The architecture of the SoC means the VC4 is basically in charge of that sort of thing. Been like that since Pi1. Google the Pi boot process for details.

pelwell commented 6 years ago

the frequency throttling is managed by the firmware without the knowledge of the CPU governor,

I could not have imagined this being the case. Thanks for clarifying. BTW. Why was it decided to be done this way?

The idea is that the ARMs are likely to crash from overheating or undervolting before the VPU does, so the VPU has autonomous control of the clocks. The ARMs can make requests, but the VPU has a right of veto - that is non-negotiable for reasons of safety.

That still leaves the question of why the cpufreq driver isn't informed of the throttling, which comes down to a choice in the design of the mailbox interface to return the previously requested frequency rather than the actual post-throttling frequency - otherwise it can get confusing if it appears that requests are being ignored. There is a new mailbox call that will actually measure the clock, but the cpufreq driver doesn't use it. It's not clear (without reading more code) what (if any) effect returning actual rather than expected clocks would have on the cpufreq framework.

ThomasKaiser commented 6 years ago

Can you point me where on the PCB to do the measurements?

I would use pins 4 and 6 on the GPIO header. And in case your USB sound-card is powered by the Pi (you refused to answer this question now the second time) you could also use an USB powermeter on one of the USB ports to get the idea how low voltage drops there (4.75V being the lowest tolerable number for most USB peripherals). But those things are usually not that precise anyway.

lategoodbye commented 6 years ago

@pelwell Could you please point me to the mailbox property which return the actually measured clock?

pelwell commented 6 years ago

See https://github.com/raspberrypi/firmware/issues/956#issuecomment-374323390.

E3V3A commented 6 years ago

@ThomasKaiser

in case your USB sound-card is powered by the Pi

I thought that was obvious. Also looking at several other issues, and also from my own experience, it seem that RPi's are not very happy using USB hubs, since there is one already built in.

Also, AFAICR, the lower voltage for USB2 is -0.6 which means 4.40 V.

JamesH65 commented 6 years ago

I believe it is very dependent on the type of hub being used.

evthree commented 6 years ago

We are a somewhat large-ish user of Raspberry Pi in our robotics company, and we have been dealing with the fallout from this issue for the last several days. Here is the summary of events followed by the conclusion:

New employee joined, setup his own raspberry-pi with latest image from March, and while working with a connected Teensy 3.2, kept finding USB serial communication dropouts. This did not happen when connecting the same teensy to an older PI with an older OS version. Triaging discovered the "Under-voltage detected" events in the message log, along with "Failed to set DTR/RTS" corresponding to the dropped connections.

Several days of troubleshooting followed. This is what we found:

  1. Another PI (different from the one above) which had been working fine for a year with no peripherals connected and red-light never blinked, when upgraded to latest OS started showing "Under-voltage errors" and red-light flashes whenever disk activity was performed. In fact even invoking "dmesg" would cause the red light to blink, causing an event to get written to the log.

  2. I went through this thread and others, and tested with a iron-clad power supply situation: A USB power bank, with a 1 F capacitor across the supply lines, and an inline USB voltage and current monitor (yes 1F, not 1uF). The voltage reading was 5.02V, and never dropped below 4.95V. Even with this, the red light would blink whenever any disk activity was performed, like issuing an "ls" command. No peripherals were connected, CPU utilization was close to zero. The is NO chance the input voltage EVER went below 4.65V for even a micro-second, yet still the red light flashed.

  3. We created a brand new SD card with the image from December, and used it to boot up the exact same PI, using the same power supply setup. Lo and behold, the red link NEVER blinks, even after connecting 4 Teensys' to the USB ports, and doing all kinds of heavy disk activity.

There is only one conclusion: The changes from December to March include not only the additional logging for under-voltage, but the way the voltage is read and tested is different. Either there is a bug, or some other voltage point on the board is being used for the test, not the input source voltage.

JamesH65 commented 6 years ago

Well, the circuitry on the Pi3B clearly doesn't chang between kernel versions, and that's the bit that actually does the detection. AFAIK, the voltage at which it detects is hardwired (cannot be set in SW), all the SW does is read whether the circuitry has fired, (and light the LED? not sure about that). The 3B+ does has a new power chip which now does the detection, but you have stated its been going a year so that is presumably be a Pi3B.

You really need to test the voltage on the Pi itself - who knows what sort of drop is happening in the USB voltage device and down the cable between it and the Pi. I've certainly seen large losses simply using short USB extensions, or switches in the cable.

I've been doing testing today on a 3B+, using a desktop power supply. I had to drop the supply to 4.72 before the power icon appeared, taking in to account the cable losses that seems about right. I didn't see any message otherwise. Device was idle on the desktop.

pelwell commented 6 years ago

@evthree Please confirm which model of Pi is exhibiting the problem - it's not something I've come across before.

evthree commented 6 years ago

The top of the board says "Raspberry PI 3 Model B V1.2", below that "(c) Raspberry PI 2015."

In our test bed we made sure that everything else except the OS version was the same between two the tests, including this same PI, the same power supply, no software running on it, yet the behavior of the red light has changed. If only the under-voltage logging was added, then why would the red light change its behavior? And why would USB connections drop-out simultaneous to the logging events?

Something else is definitely happening besides just the additional logging. I am convinced there is some hard to detect bug that has been introduced, I sincerely hope someone is looking for it.

pelwell commented 6 years ago

I am now. One more thing - please post the output from vcgencmd otpdump, just to be sure that all is add it should be.

HiassofT commented 6 years ago

I'd recommend hooking up a scope to GPIO pins 4 and 6 (+5V/GND) and setting a falling edge trigger at about 4.8V. From my experience that's the most accurate way to determine if an undervoltage condition occurred.

Multimeters are way too slow to catch short voltage dips and if you want to check if the undervoltage detection on the RPi is working correctly using a scope is the only reliable way.

evthree commented 6 years ago

March OS, the one exhibiting the problem: pi@raspi4:~ $ cat /etc/debian_version 9.4 pi@raspi4:~ $ vcgencmd otp_dump 08:00000000 09:00000000 10:00000000 11:00000000 12:00000000 13:00000000 14:00000000 15:00000000 16:00280000 17:1020000a 18:1020000a 19:ffffffff 20:ffffffff 21:ffffffff 22:ffffffff 23:ffffffff 24:ffffffff 25:ffffffff 26:ffffffff 27:00002727 28:3786c562 29:c8793a9d 30:00a02082 31:00000000 32:00000000 33:00000000 34:00000000 35:00000000 36:00000000 37:00000000 38:00000000 39:00000000 40:00000000 41:00000000 42:00000000 43:00000000 44:00000000 45:00000000 46:00000000 47:00000000 48:00000000 49:00000000 50:00000000 51:00000000 52:00000000 53:00000000 54:00000000 55:00000000 56:00000000 57:00000000 58:00000000 59:00000000 60:00000000 61:00000000 62:00000000 63:00000000 64:00000000 65:00000000 66:00000000 pi@raspi4:~

December OS, not exhibiting the problem: pi@raspberrypi:~ $ cat /etc/debian_version 9.1 pi@raspberrypi:~ $ vcgencmd otp_dump 08:00000000 09:00000000 10:00000000 11:00000000 12:00000000 13:00000000 14:00000000 15:00000000 16:00280000 17:1020000a 18:1020000a 19:ffffffff 20:ffffffff 21:ffffffff 22:ffffffff 23:ffffffff 24:ffffffff 25:ffffffff 26:ffffffff 27:00002727 28:3786c562 29:c8793a9d 30:00a02082 31:00000000 32:00000000 33:00000000 34:00000000 35:00000000 36:00000000 37:00000000 38:00000000 39:00000000 40:00000000 41:00000000 42:00000000 43:00000000 44:00000000 45:00000000 46:00000000 47:00000000 48:00000000 49:00000000 50:00000000 51:00000000 52:00000000 53:00000000 54:00000000 55:00000000 56:00000000 57:00000000 58:00000000 59:00000000 60:00000000 61:00000000 62:00000000 63:00000000 64:00000000 65:00000000 66:00000000

ThomasKaiser commented 6 years ago

AFAICR, the lower voltage for USB2 is -0.6 which means 4.40 V

Only for 'low-power' devices that require less than 100mA. Since you now confirmed that you're using a host powered USB device (I was asking about powering all the time and not whether there is another hub in between -- some of the few Audio devices I know have an own PSU) the lower limit is 4.75V.

Anyway: based on @evthree's report there seems to be an issue also related to the closed source ThreadX (AKA firmware) affecting this whole issue. Time to stop wasting time ;)

pelwell commented 6 years ago

@evthree I'd like to isolate the issue to the kernel or the firmware.

  1. On a working image from December (you may want to clone the card to save time later), upgrade the firmware, leaving the kernel unchanged:

    pi@raspberrypi:~$ sudo SKIP_KERNEL=1 rpi-update

    See if that is broken.

  2. On a different card, install/clone the latest image and downgrade the firmware to the same as that shipped in the December image:

    pi@raspberrypi:~$ sudo SKIP_KERNEL=1 rpi-update a6b3e85

    Check if that works.

E3V3A commented 6 years ago

With firmware Donald Duck programming like this, no wonder people are pissed!

If already spammed kernel log with msg1, then spam with msg2, else spam with msg1 again!

    if (new_uv != old_uv) {
        if (new_uv)
            pr_crit("Under-voltage detected! (0x%08x)\n", *value);
        else
            pr_info("Voltage normalised (0x%08x)\n", *value);
}

So since the rpi devs here, are not willing to listen to the community, we have to solve it by ourselves.

To disable throttling & spam

To completely disable the dmesg or kernel log from getting spammed, you can put this in a shell script or directly from command line, in the background. This will also race the throttling mechanism, but will get you back to run mostly on 1200 MHz. So far without any notable side effects, except a very slight increase in cpu usage. However, it will not remove the flash, as it seem controlled independently by the video core (VC4). But you should be able to remove it with: avoid_warnings=2 that disables the warning overlays and allow continued turbo mode.

while true; do vcgencmd get_throttled 0xff >/dev/null; done &

At first I tried to be nice, by putting a sleep() in there, but since closed source firmware is never nice, it did not take proper effect until the loop was unlimited. Using c-code would be easier to control the execution.


The next options to work around this, is to:

pelwell commented 6 years ago

You've misunderstood the firmware driver code - old_uv and new_uv are effectively booleans, and the comparison acts as an edge detector - one message is for the rising edge, the other for the falling edge.

JamesH65 commented 6 years ago

We didn't listen to the community so much that at no point yesterday did I modify the driver to add rate limiting, spend some time testing, and then create a PR today.

Oh, hold on, here it is. Looks like we did listen after all.

https://github.com/raspberrypi/linux/pull/2520

And before accusing people of DD programming, probably best to understand what the code is doing before spouting off and making yourself look foolish. Remember, the people you seem so keen on pissing off are the people YOU need to fix things.

lategoodbye commented 6 years ago

From my point of view there are 2 use cases: 1) Adjustable power supply 2) Non-adjustable power supply

Case 1: The current logging behavior is helpful to find the correct settings at runtime. Case 2: Since the user doesn't have the chance to change the PSU during runtime, this ping-pong behavior between under-voltage detected and "normalised" ísn't helpful. It's sufficient to print the issue only once, because the provided power won't get better.

Here my suggestion: Add a DT or kernel parameter to switch between the following modes: a - current kernel log behavior b - store the sticky bits and only add a new kernel messages if a new sticky bit has been added

Just my two cents

P33M commented 6 years ago

@lategoodbye I disagree with Case 2: if your power supply is "known good" yet you have a misbehaving peripheral connected either via GPIO header or USB port, having a timestamped log (albeit ratelimited) allows you to figure out which peripheral/usage condition is causing the undervoltage.

Components can fail in service, so it's a useful diagnostic aid.

E3V3A commented 6 years ago

@lategoodbye Your suggestion is great! Probably the only one that can satisfy everyone. I would also be very happy to see this added as a boot/config.txt or cmdline parameter. But I have clearly used up all my good Karma points here, so perhaps some other people could also chime in as well?

@P33M To disagree with one of the cases is not helpful, if the other case is also available.
(This thread has already become an epitome of all sorts of disagreements, on all sort of levels.)
So how do you suggest to move ahead with this?

@JamesH65 PR 2520 is perhaps vanilla helpful, but seem redundant since we should already have the sysctl items for that. The following should accomplish the exact same thing:

sudo sysctl -w kernel.printk_devkmsg=ratelimit
sudo sysctl -w kernel.printk_ratelimit=300
sudo sysctl -w kernel.printk_ratelimit_burst=3

I use the word, "should" here, because as I mentioned in a previous post, it seem that these are ignored, so if that PR enable them again, that is great.

The other problem with that PR, is that it would probably also throttle all other kernel log messages. That is also why I would vote for Stefan's suggestion.