Open networkimprov opened 10 years ago
Is this something you want Tony to investigate, or me?
Sure, go for it. @tmlind might have some ideas...
There at least used to be a bug in systemd where HandlePowerKey=poweroff did not work. The solution was to install acpid also. In the long run we don't want acpid just to handle the power key.
So I managed to figure out what the issue is. Systemd has a udev rule to tag the power switch and our power switch does not match the udev tag. I wrote a udev rule to match against our attributes and systemd shutdown the system when I pressed it. I will post more details when my internet comes back on my main machine.
Hm, one wonders about systemd's QA for ARM platforms... http://virtuallyhyper.com/2014/06/arch-linux-systemd-on-arm-architecture-issue/
We need to prevent such problems. Should we host systemd in our custom repo?
So here is the fix:
cat /usr/lib/udev/rules.d/70-power-switch-anvl.rules
ACTION=="remove", GOTO="power_switch_end"
SUBSYSTEMS=="input", ATTRS{name}=="twl4030_pwrbutton", TAG+="power-switch"
LABEL="power_switch_end"
Here is how I found the correct info to use:
udevadm info -a -p $(udevadm info -q path -n /dev/input/event0)
Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.
looking at device '/devices/68000000.ocp/48070000.i2c/i2c-0/0-0048/pwrbutton.33/input/input0/event0':
KERNEL=="event0"
SUBSYSTEM=="input"
DRIVER==""
looking at parent device '/devices/68000000.ocp/48070000.i2c/i2c-0/0-0048/pwrbutton.33/input/input0':
KERNELS=="input0"
SUBSYSTEMS=="input"
DRIVERS==""
ATTRS{name}=="twl4030_pwrbutton"
ATTRS{phys}=="twl4030_pwrbutton/input0"
ATTRS{uniq}==""
ATTRS{properties}=="0"
looking at parent device '/devices/68000000.ocp/48070000.i2c/i2c-0/0-0048/pwrbutton.33':
KERNELS=="pwrbutton.33"
SUBSYSTEMS=="platform"
DRIVERS=="twl4030_pwrbutton"
looking at parent device '/devices/68000000.ocp/48070000.i2c/i2c-0/0-0048':
KERNELS=="0-0048"
SUBSYSTEMS=="i2c"
DRIVERS=="twl"
ATTRS{name}=="twl4030"
looking at parent device '/devices/68000000.ocp/48070000.i2c/i2c-0':
KERNELS=="i2c-0"
SUBSYSTEMS=="i2c"
DRIVERS==""
ATTRS{name}=="OMAP I2C adapter"
looking at parent device '/devices/68000000.ocp/48070000.i2c':
KERNELS=="48070000.i2c"
SUBSYSTEMS=="platform"
DRIVERS=="omap_i2c"
looking at parent device '/devices/68000000.ocp':
KERNELS=="68000000.ocp"
SUBSYSTEMS=="platform"
DRIVERS==""
Notice that the problem is that the criteria in the original power-switch rule was:
SUBSYSTEMS=="acpi"
Which does not match any of the listed attributes. Instead our attributes list input, platform, and i2c as the subsystems.
Do you know which package I should include the udev rule in? Maybe anvl-usb?
Since it's replacing a config file, maybe do that in create-rootfs.sh? Also can we submit a patch upstream to archlinuxarm or systemd?
Any thoughts on my previous message?
Alright, good to hear you figured out the root cause :)
I submitted a bug for systemd at: https://bugs.freedesktop.org/show_bug.cgi?id=84201
With regards to hosting a separate systemd package, I would like to avoid that if we could since then we might introduce more issues that way due to incompatibilities between our systemd and the one provided by archlinuxarm, we would have to persistently maintain a separate package and make sure that our old systemd package doesn't break any other existing packages.
Maybe we could coordinate with archlinuxarm/archlinux systemd maintainers when systemd and any other interesting packages enter testing, we could get some kind of notification to run our tests against our anvl. This also isn't perfect since now this requires us testing this manually every time systemd gets an update.
Ideally we could set up an automated system to reboot every time an interesting package gets updated and after a successful boot, send a heartbeat. If an external system doesn't get the heartbeat within some time, it would send out a notification to one of us that something failed. Something like continuous integration for hardware. Though this seems like a pipe dream :)
@tmlind Thoughts?
Not a pipe dream. Added https://github.com/networkimprov/arch-packages/issues/22
Now we need to catch the switch event and make it disable/re-enable the radio, LED, etc (More details to follow in issue text.)
Since it's replacing a config file, maybe do that in create-rootfs.sh
I made it a seperate config file for now so that we do not have to replace a file managed by pacman :) I think I will put it in anvl-usb for now just to have it live somewhere in a package.
And upstream fixed it with the fix I described .. that was quick http://cgit.freedesktop.org/systemd/systemd/commit/?id=58d4aabedd
I guess I can keep this ticket open until systemd releases a new update so that we can remove my workaround, for the time being.
This ticket has evolved! See new item in issue description...
I seem to be missing all the changes to previous comments and descriptions today...
So for running those actions on the power button, we could tell systemd to go to sleep/hibernate when the power button gets pressed by configuring it in /etc/systemd/logind.conf and then providing a sleep/hibernate hook to run additional commands.
Here is a link to how we could achieve the hooks https://wiki.archlinux.org/index.php/Power_management#Sleep_hooks
@tmlind, is hibernating (write ram to emmc) workable? And if so, can we wake on both switch and vbus?
If not, we only need our own script.
Suspend works, but you currently need to ifconfig mlan0 down first, then ifconfig maln0 up after waking up.
Do we get any benefit from standard suspend, since the omap has off_idle? I was wondering about hibernate, where you write out the ram and shut off.
Pretty much the only advantage is that timers won't run any longer saving a little bit more power during idle. The disadvantage is that standard apps like cron won't work in suspend, and since it's a server, we should keep the system running constantly. I guess the power switch action should be user selectable and configurable, the default action should probably be poweroff. A handy option would be to just ifconfig mlan0 down to shut down the radio.
It's only a server with a live battery! ;-) Cron jobs should run when cable-powered. Right, suspend has the adv of freezing running apps, so let's do that. I don't want to power-off, as that forces a longish boot sequence. I gather hibernate doesn't work, since you haven't said a word about iit :-)
Having enabled the power switch and disabled power-off, now we need it to trigger suspend, and bring wifi down before suspend and up after wake...
When you mention suspend, you mean the normal suspend supported by systemd? systemctl suspend
? If so, I think we can run custom scripts before/after suspend
So when I tried to run systemd suspend
I got the following:
# journalctl -x
https://gist.githubusercontent.com/gostrc/3f76210402bd5f5426f7/raw/a948760b5a5f29fbe7bacd6340be2d9ac54c2d78/gistfile1.txt
and I made sure to run the following right before:
# ip link set dev mlan0 down
so it looks like mwifiex_sdio is preventing the suspend from actually happening even though it is down.
Yes that's a known issue.. For now you need to do:
And then on wake-up:
Also, the log indicates that systemd repeats the sequence, for "mem sleep" and then "freeze sleep". Seems unnecessary/misconfigured?
echo mem > /sys/power/state
Isn't this hibernate rather than suspend? I thought we wanted to suspend, and also, shouldn't this be handled by systemd's targets? We just need to run pre/post commands to handle the mwifiex_sdio issue.
log indicates that systemd repeats the sequence
I might try to ask someone about this... it seems like after the first failure, it shouldn't have to retry anything.
I made the following file under /usr/lib/systemd/system-sleep/mwifiex_sdio-workaround.sh
#!/bin/sh
# based off of:
# https://wiki.archlinux.org/index.php/Power_management#Hooks_in_.2Fusr.2Flib.2Fsystemd.2Fsystem-sleep
case $1/$2 in
pre/*)
echo "Going to $2..."
ip link set dev mlan0 down
modprobe -r mwifiex_sdio
;;
post/*)
echo "Waking up from $2..."
modprobe mwifiex_sdio
ip link set dev mlan0 up
;;
esac
Also I ran: # chmod +x /usr/lib/systemd/system-sleep/mwifiex_sdio-workaround.sh
I hit the power button (linked to the suspend target).
It seems like it went to sleep successfully.
Then when I hit the power button again to wake it up I checked journalctl -x
https://gist.githubusercontent.com/gostrc/861957dfc0bc0b2850e7/raw/89dc12c88337493846c6cab9c4ad69296ff5fede/gistfile1.txt
Note that there is a line that corresponds to the modprobe mwifiex_sdio
line which reads:
Oct 16 05:32:41 anvl kernel: mwifiex_sdio mmc2:0001:1: WLAN FW already running! Skip FW dnld
Oct 16 05:32:41 anvl kernel: mwifiex_sdio mmc2:0001:1: WLAN FW is active
Oct 16 05:32:41 anvl systemd-sleep[1861]: /usr/lib/systemd/system-sleep/mwifiex_sdio-workaround.sh failed with error code 1.
and it seems like the mlan0 interface does not exist:
[self@anvl ~]$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: sit0: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
link/sit 0.0.0.0 brd 0.0.0.0
4: usb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 42:54:7b:c4:ea:1d brd ff:ff:ff:ff:ff:ff
After I woke up after the suspend, I tried to modprobe -r mwifiex_sdio
and got:
Oct 16 05:46:57 anvl sudo[1903]: self : TTY=pts/0 ; PWD=/home/self ; USER=root ; COMMAND=/usr/bin/modprobe -r mwifiex_sdio
Oct 16 05:46:57 anvl sudo[1903]: pam_unix(sudo:session): session opened for user root by self(uid=0)
Oct 16 05:47:07 anvl kernel: mwifiex_sdio mmc2:0001:1: cmd_wait_q terminated: -512
Oct 16 05:47:07 anvl kernel: mwifiex_sdio mmc2:0001:1: cmd_wait_q terminated: -512
Oct 16 05:47:07 anvl kernel: WARNING: driver mwifiex_sdio did not remove its interrupt handler!
trying to resuspend after the suspend I got:
Oct 16 05:48:02 anvl kernel: Freezing user space processes ...
Oct 16 05:48:02 anvl kernel: Freezing of tasks failed after 20.007 seconds (1 tasks refusing to freeze, wq_busy=0):
Oct 16 05:48:02 anvl kernel: modprobe D c0676c18 0 1904 1903 0x00000001
Oct 16 05:48:02 anvl kernel: [<c0676c18>] (__schedule) from [<c04b4ef4>] (__mmc_claim_host+0x94/0x1c0)
Oct 16 05:48:02 anvl kernel: [<c04b4ef4>] (__mmc_claim_host) from [<c04bf334>] (sdio_bus_remove+0xe4/0xfc)
Oct 16 05:48:02 anvl kernel: [<c04bf334>] (sdio_bus_remove) from [<c039c274>] (__device_release_driver+0x70/0xc4)
Oct 16 05:48:02 anvl kernel: [<c039c274>] (__device_release_driver) from [<c039c980>] (driver_detach+0xb4/0xb8)
Oct 16 05:48:02 anvl kernel: [<c039c980>] (driver_detach) from [<c039bfd8>] (bus_remove_driver+0x4c/0xa0)
Oct 16 05:48:02 anvl kernel: [<c039bfd8>] (bus_remove_driver) from [<c00b6ce8>] (SyS_delete_module+0x110/0x1a0)
Oct 16 05:48:02 anvl kernel: [<c00b6ce8>] (SyS_delete_module) from [<c000e620>] (ret_fast_syscall+0x0/0x48)
Oct 16 05:48:02 anvl kernel:
Oct 16 05:48:02 anvl kernel: Restarting tasks ... done.
and the second suspend failed as a result.
This from the docs:
1) relative_sleep_states = 1
"mem", "standby", "freeze" represent non-hibernation sleep
states from the deepest ("mem", always present) to the
shallowest ("freeze"). "standby" and "freeze" may or may
not be present depending on the capabilities of the
platform. "freeze" can only be present if "standby" is
present.
2) relative_sleep_states = 0 (default)
"mem" - "suspend-to-RAM", present if supported.
"standby" - "power-on suspend", present if supported.
"freeze" - "suspend-to-idle", always present.
Also, "disk" is hibernate.
Thanks. For some reason i wasnt able to google for the doc but your snippet led me to https://www.kernel.org/doc/Documentation/power/states.txt which ill have a look at.
The firmware we want loaded on the WLAN as that's the only way to keep it from sucking lots of power. But sounds like we also need to be able to do a full reset of the WLAN hardare for sure.
Is this related to the WLAN reset problem due to GPIO on OMAP vs PMIC? Or could it be a bug or misconfiguration in the mwifiex driver? Next steps?
Seems to be a combination of issues. The mwifiex driver and firmware should be able to come back up from suspend even without reinitializing firmware. And we should be able to reset WLAN if we need to. Also the GPIO off-idle glitch may be affecting this issue as that can reset the WLAN. Next steps are to add a way to reset WLAN when needed so we can debug it further. And then maybe file a bug report with the Marvell guys once we've verified the issue.
@gostrc, you have some uncommitted stuff for this? Maybe should commit those files as is to anvl-util, but not update the pkgbuild or create-rootfs until we can test it on the next anvl batch...
@networkimprov I only have the file that I posted in the comments. I didn't commit it because there seems to be a bug with sleeping the anvl and the mwifiex driver. I could probably commit to a separate branch though.
Isn't there some systemd config as well?
to enable that custom script? nope, that's standard systemd :)
the udev rule is in anvl-util, though systemd 218 which was released recently has the modified rule, so we should be able to get rid of it once archlinux-arm updates to the latest release of systemd.
I meant for suspend on power-switch, discussed starting https://github.com/networkimprov/arch-packages/issues/11#issuecomment-56473770
Ah right, I will include that in the installer directory.
specifically: https://github.com/networkimprov/omap3-usb-boot-install/tree/master/rootfs
There are some unmerged commits in your installer branch, btw: https://github.com/networkimprov/omap3-usb-boot-install/commits/rootfs-enhancements-p2
Thanks for reminding me, I will add to anvl-util, update the package, open a pull request, and try to create a new image.
You were also going to add the systemd config for suspend on power switch to rootfs, I think.
Oops, I thought I set it to suspend already but it was on ignore. Fixed in: https://github.com/networkimprov/omap3-usb-boot-install/commit/bc829f3dacfe5ec9cab91128c1a60e55c35766cf
@tmlind, I think this item is still in play, since we want switch to disable radio, led, etc, vs actually suspending.
o.011. enable power switch Our switch generates a keyboard event. Tony suspects a bug in systemd. Switch event should either: a. put system in deep-sleep (radio, LED off, others?) b. wake from sleep (VBUS will also)