redhat-performance / tuned

Tuning Profile Delivery Mechanism for Linux
GNU General Public License v2.0
829 stars 179 forks source link

tuned crashes when the dfuzzer command is executed to switch the profile #300

Open hanxinke opened 4 years ago

hanxinke commented 4 years ago

Run the following command:./dfuzzer -v -n com.redhat.tuned -o /Tuned -i com.redhat.tuned.control -t switch_profile Output: [SYSTEM BUS] [PROCESS: /usr/bin/python3 -Es /usr/sbin/tuned -l -P -D [PACKAGE: [CONNECTED TO PID: 69716 Object: /Tuned Interface: com.redhat.tuned.control FAIL switch_profile - process exited Exit status: 2

It is suspected that the self._udev_monitor.filter_by(subsystem) in tuned/hardware/inventory.py is executed during the function of subscribe but the self._udev_monitor.remove_filter() is not executed during the function of unsubscribe. As a result, the tuned process crashes.

yarda commented 4 years ago

Thanks for the report. Which versions? I am unable to reproduce with: python3-pyudev-0.21.0-11.fc31 dfuzzer-1.4-10.fc31.x86_64

$ dfuzzer -v -n com.redhat.tuned -o /Tuned -i com.redhat.tuned.control -t switch_profile
'./dfuzzer.conf' file not found.
'/home/user/.dfuzzer.conf' file not found.
Suppressions from '/etc/dfuzzer.conf'
[SESSION BUS]
Error: The connection does not exist.
[SYSTEM  BUS]
[PROCESS: /usr/bin/python3 -Es ./tuned.py 
[PACKAGE: 
[CONNECTED TO PID: 6858
Object: /Tuned
 Interface: com.redhat.tuned.control
  PASS switch_profile
Exit status: 0
yarda commented 4 years ago

Checking against the latest git/master.

yarda commented 4 years ago

I tried with the 'balanced' profile, but maybe my machine is not fast enough to trigger the race.

yarda commented 4 years ago

I don't think it's required to call corresponding remove_filter() to every filter_by() call. From the pyudev documentation it seems the 'remove_filter' was added in the 0.15 version and filter_by call was already supported. Also I am unable to find anything related in the udev docs, could you provide more information?

yarda commented 4 years ago

I checked the code, for the filter_by the pyudev calls udev_monitor_filter_add_match_subsystem_devtype and udev_monitor_filter_update with some minor glue logic.

The systemd implements the udev_monitor_filter_update by the setsockopt call with the SO_ATTACH_FILTER, from the setsockopt manual page:

These options may be set multiple times for a given socket, each time replacing the previous filter program. 
The classic and extended versions may be called on the same socket, but the previous filter will always
be replaced such that a socket never has more than one filter defined.

So if there is a race, it's probably caused by something else.

hanxinke commented 4 years ago

Thank you very much for your advice. The version is python3-pyudev-0.21.0 dfuzzer-1.4 This problem also occurs in the latest master branch code.

The part of tuned.log is _2020-10-13 05:29:00,126 INFO tuned.daemon.application: dynamic tuning is globally disabled 2020-10-13 05:29:00,133 DEBUG tuned.daemon.daemon: initializing daemon 2020-10-13 05:29:00,133 INFO tuned.daemon.daemon: using sleep interval of 1 second(s) 2020-10-13 05:29:00,134 INFO tuned.daemon.daemon: Running in automatic mode, checking what profile is recommended for your configuration. 2020-10-13 05:29:00,136 DEBUG tuned.utils.commands: Executing virt-what. 2020-10-13 05:29:00,210 DEBUG tuned.utils.commands: Read data from file: '<_io.TextIOWrapper name='/etc/system-release-cpe' mode='r' encoding='UTF-8'>' > 'cpe:/o:openEuler:openEuler:1.0:ga:server ' 2020-10-13 05:29:00,211 DEBUG tuned.utils.commands: Executing virt-what. 2020-10-13 05:29:00,300 DEBUG tuned.utils.commands: Read data from file: '<_io.TextIOWrapper name='/etc/system-release-cpe' mode='r' encoding='UTF-8'>' > 'cpe:/o:openEuler:openEuler:1.0:ga:server ' 2020-10-13 05:29:00,301 DEBUG tuned.utils.commands: Executing virt-what. 2020-10-13 05:29:00,367 DEBUG tuned.utils.commands: Read data from file: '<_io.TextIOWrapper name='/etc/system-release-cpe' mode='r' encoding='UTF-8'>' > 'cpe:/o:openEuler:openEuler:1.0:ga:server ' 2020-10-13 05:29:00,367 DEBUG tuned.utils.commands: Executing virt-what. 2020-10-13 05:29:00,468 INFO tuned.daemon.daemon: Using 'virtual-guest' profile 2020-10-13 05:29:00,471 INFO tuned.profiles.loader: loading profile: virtual-guest 2020-10-13 05:29:00,478 INFO tuned.daemon.controller: starting controller 2020-10-13 05:29:00,479 INFO tuned.daemon.daemon: starting tuning 2020-10-13 05:29:00,480 DEBUG tuned.plugins.repository: creating plugin cpu 2020-10-13 05:29:00,480 DEBUG tuned.utils.plugin_loader: loading module tuned.plugins.plugin_cpu 2020-10-13 05:29:00,486 DEBUG tuned.plugins.repository: creating plugin disk 2020-10-13 05:29:00,486 DEBUG tuned.utils.plugin_loader: loading module tuned.plugins.plugin_disk 2020-10-13 05:29:00,493 DEBUG tuned.hardware.inventory: adding handler: block (<tuned.plugins.plugin_disk.DiskPlugin object at 0xfffe5b5c13d0>) 2020-10-13 05:29:00,494 DEBUG tuned.hardware.inventory: starting monitor observer 2020-10-13 05:29:00,495 DEBUG tuned.plugins.repository: creating plugin sysctl 2020-10-13 05:29:00,495 DEBUG tuned.utils.plugin_loader: loading module tuned.plugins.plugin_sysctl 2020-10-13 05:29:00,496 DEBUG tuned.units.manager: creating 'cpu' (cpu) 2020-10-13 05:29:00,497 DEBUG tuned.plugins.base: assigning devices to instance cpu 2020-10-13 05:29:00,498 INFO tuned.plugins.base: instance cpu: assigning devices cpu5, cpu0, cpu4, cpu3, cpu7, cpu2, cpu1, cpu6 2020-10-13 05:29:00,498 DEBUG tuned.plugins.base: initializing instance cpu (cpu) 2020-10-13 05:29:00,498 DEBUG tuned.monitors.repository: creating monitor load 2020-10-13 05:29:00,498 DEBUG tuned.utils.plugin_loader: loading module tuned.monitors.monitor_load 2020-10-13 05:29:00,499 DEBUG tuned.monitors.base: available devices: system 2020-10-13 05:29:00,500 DEBUG tuned.utils.commands: Executing ['x86_energy_perf_policy', '-r']. 2020-10-13 05:29:00,503 WARNING tuned.plugins.plugin_cpu: unable to run x86_energy_perf_policy tool, ignoring CPU energy performance bias, is the tool installed? 2020-10-13 05:29:00,504 DEBUG tuned.units.manager: creating 'disk' (disk) 2020-10-13 05:29:00,504 DEBUG tuned.plugins.base: assigning devices to instance disk 2020-10-13 05:29:00,504 WARNING tuned.plugins.base: instance disk: no matching devices available 2020-10-13 05:29:00,504 DEBUG tuned.plugins.base: initializing instance disk (disk) 2020-10-13 05:29:00,504 DEBUG tuned.monitors.repository: creating monitor disk 2020-10-13 05:29:00,505 DEBUG tuned.utils.plugin_loader: loading module tuned.monitors.monitor_disk 2020-10-13 05:29:00,506 DEBUG tuned.monitors.base: available devices: 2020-10-13 05:29:00,506 DEBUG tuned.units.manager: creating 'sysctl' (sysctl) 2020-10-13 05:29:00,507 DEBUG tuned.plugins.base: initializing instance sysctl (sysctl) …… 2020-10-13 05:30:16,818 INFO tuned.daemon.controller: Applying previously applied profile. 2020-10-13 05:30:16,818 INFO tuned.daemon.daemon: starting tuning 2020-10-13 05:30:16,819 DEBUG tuned.plugins.repository: creating plugin cpu 2020-10-13 05:30:16,819 DEBUG tuned.utils.plugin_loader: loading module tuned.plugins.plugin_cpu 2020-10-13 05:30:16,821 DEBUG tuned.exports.dbus_exporter: checking authorization for for action 'com.redhat.tuned.switch_profile' requested by caller ':1.58' 2020-10-13 05:30:16,822 DEBUG tuned.plugins.repository: creating plugin disk 2020-10-13 05:30:16,822 DEBUG tuned.utils.plugin_loader: loading module tuned.plugins.plugin_disk 2020-10-13 05:30:16,824 DEBUG tuned.exports.dbus_exporter: action 'com.redhat.tuned.switch_profile' requested by caller ':1.58' was successfully authorized by polkit 2020-10-13 05:30:16,825 INFO tuned.daemon.daemon: stopping tuning 2020-10-13 05:30:16,827 DEBUG tuned.hardware.inventory: adding handler: block (<tuned.plugins.plugindisk.DiskPlugin object at 0xfffe59a397d0>) stack smashing detected : terminated Aborted (core dumped)

Please add logs to ensure that self._udev_monitor.filter_by(subsystem) runs. It appears after running 170 times in my environment. When I add self._udev_monitor.remove_filter(), the problem is solved.

yarda commented 4 years ago

Please could you try replacing self._udev_monitor.remove_filter() with just small delay? To be sure that the problem is not caused by timing.

hanxinke commented 4 years ago

ok, I try to replace self._udev_monitor.remove_filter() to time.sleep(10), but the problem still exists.