kmsgd Blocks Indefinitely

shaseley commented 8 years ago

Issue

While testing the kmsgd SELinux changes, I discovered that kmsgd stops producing output, even though there are more messages being logged to the kernel system log. This is due to another reader (logd) reading from /proc/kmsg.

Since 5.1.1_r3, a new logd feature was added that reads messages from /proc/kmsg and adds them to a new kernel (virtual) log buffer available through logcat. While we disabled most logd logging by setting TARGET_USES_LOGD := false, the logd process still runs due to init.rc. It appears that /proc/kmsg cannot handle multiple readers. As a test, I disabled logd and kmsgd continues to log output as expected. As soon as I run adb shell cat /proc/kmsg, it stops.

Potential Fixes

Disable `logd` Kernel Logging

They've added a build property -- logd.klogd -- that we can set to turn this off in logd. This doesn't stop logd from running, but it should stop it from interfering with kmsgd. However, we would still have the issue if any other processes read from /proc/kmsg.

Disable `logd`

Removing logd from init.rc would also stop it from interfering with kmsgd, but we would need to muck with the build and/or init.rc.

Read From `/dev/kmsg` Instead

The documentation for /dev/kmsg says that it supports multiple readers. The change is a little more involved since the data is formatted differently, but it's not hard. This is probably the best solution since reading from /proc/kmsg is a bit brittle because of the multiple reader issue.

Switch to `logd`

This may where we're headed in the medium term, but for now it's probably best to push an easy short-term fix so we get the logs.

jhshi commented 8 years ago

I'm OK with either disabling logd completely (both in kernel and init.rc), or changing kmsgd to read from /dev/kmsg. The later seems to be the right way, if it's easy to implement.

jhshi commented 8 years ago

In fact, if logd does what's being claimed, then we do not need kmsgd. Just set to be all in LogcatTask's manifest. @shaseley can we verify 1) if logd works properly on our platform and 2) if so, does a logcat w/o -b option give us kernel logs?

shaseley commented 8 years ago

I'm testing it now using /dev/kmsg. It works great as far as the multiple readers are concerned. The format is different - here's a snippet:

... KernelPrintk: 4,2221,629354036,-;Setting rand mac oui to FW - 92:68:c3 ... KernelPrintk: 4,2222,629961208,-;dhd_pno_initiate_gscan_request enter - run 0 flush 0 ... KernelPrintk: 4,2223,629988499,-;dhd_set_suspend: force extra Suspend setting ... KernelPrintk: 4,2224,629993410,-;dhd_get_suspend_bcn_li_dtim beacon=100 bcn_li_dtim=2 DTIM=2 Listen=10

It's probably easiest and safest to just send the data like it is and parse it as needed. I'd rather parse it in Python (or whatever) than C. We do get a printk line number (second field), which is kind of nice.

shaseley commented 8 years ago

@jhshi The problem is we've disabled all of the other logd functions and switched to the kernel log approach (/dev/log/*). At least in 5.1.1_r3, there was a very large overhead with logd logging that caused us to drop a lot of log lines when high volume logging was enabled. When setting TARGET_USES_LOGD := false, we've disabled the ability to read from logd's buffers, including the new kernel buffer.

Eventually (soonish), I want to investigate improving logd's efficiency and get rid of kmsgd and traced.

jhshi commented 8 years ago

Reading from /dev/kmsg is good enough with me. Are they ready to be pushed?

shaseley commented 8 years ago

Yes, pushing now and will close the issue when done. Note, this requires a new SELinux permission which will be pushed too.

shaseley commented 8 years ago

All set.

jhshi commented 8 years ago

I'll closes this after pushing out the OTA, so I keep track of what's in each OTA :-)

shaseley commented 8 years ago

Got it!

jhshi commented 8 years ago

Fixed in 4.1.4

phonelab / cm-shamu.manifest