intel / thermal_daemon

Thermal daemon for IA
GNU General Public License v2.0
540 stars 117 forks source link

Alleged kernel memory leak in LK 6.0.0-rc3 through rc7 #377

Closed mtodorov3-69 closed 1 month ago

mtodorov3-69 commented 1 year ago

Hi all,

This is actually a kernel memory leak, but the /sys/kernel/debug/kmemleak facility reports that it is connected with thermald on an Lenovo IdeaPad3 running Ubuntu 22.04 "jammy".

Here is the output:

sudo cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881095f3ee0 (size 80):
  comm "thermald", pid 837, jiffies 4294896698 (age 9867.428s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 0d 01 2d 00 00 00 00 00  ..........-.....
    af 07 01 00 00 c9 ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000b50b9dd6>] kmem_cache_alloc+0x184/0x380
    [<00000000fa8428c0>] acpi_os_acquire_object+0x2c/0x32
    [<000000002cc0099f>] acpi_ps_alloc_op+0x65/0xe6
    [<00000000335faf1b>] acpi_ps_get_next_arg+0x842/0x9ed
    [<000000007afa2dee>] acpi_ps_parse_loop+0x718/0xee1
    [<0000000010ce490e>] acpi_ps_parse_aml+0x261/0x7b2
    [<00000000278d4c5f>] acpi_ps_execute_method+0x360/0x459
    [<00000000ff7ad4ba>] acpi_ns_evaluate+0x595/0x810
    [<0000000037ce3488>] acpi_evaluate_object+0x28b/0x5b2
    [<000000001a800bbf>] acpi_run_osc+0x209/0x3d0
    [<00000000776fbd43>] int3400_thermal_run_osc+0xed/0x180 [int3400_thermal]
    [<00000000d6ec2302>] current_uuid_store+0x17c/0x1d0 [int3400_thermal]
    [<00000000486cf3e6>] dev_attr_store+0x3e/0x60
    [<00000000bf193027>] sysfs_kf_write+0x88/0xa0
    [<00000000820b5cce>] kernfs_fop_write_iter+0x1c9/0x270
    [<0000000062f8d35e>] vfs_write+0x5a5/0x750

Hope this helps. I will provide additional info on request.

Kind regards.

spandruvada commented 1 year ago

Hi, There is no change in the thermal code. Checking if there is a change in ACPI.

spandruvada commented 1 year ago

Can you help to bisect between these two versions?

mtodorov3-69 commented 1 year ago

The version of thermald is 2.4.9-1ubuntu0.1 with both kernel versions.

mtodorov3-69 commented 1 year ago

I'm currently bisecting another bug. Could you please tell me what versions I should roughly start with?

mtodorov3-69 commented 1 year ago

Hello,

I have to receive confirmation that the previous bad commit was fixed before deleting previous bisect builds and proceeding to this task. I am somewhat limited with hard drive space, and builds take up ~37 GiB. Then I will proceed, Lord willing, in my off hours, for this is not an excuse but the explanation that my day job isn't about Linux kernel debugging, but something much more mundane. ;) If you have an idea about the range of kernels that need bisecting (provided what you said that thermald source was stable). I am interested in bisecting this problem, but please not that, as you already know, setting CONFIG_DEBUG_KMEMLEAK=y will make most of the developers to disregards the kernel as "non-vanilla" or even "tainted". The config option appeared very early, as in 2.x.y kernels, so there will be considerable number of bisects. Thank you for your attention, and please excuse me for my scribomanic style.

mtodorov3-69 commented 1 year ago

Can you help to bisect between these two versions?

Both of the 6.0-rc3 and 6.0-rc7 showed memleak connected with thermald.

mtodorov3-69 commented 1 year ago

Can you help to bisect between these two versions?

Hello Mr. Pandruvada,

I have finally found time for a bisect on this bug.

mtodorov@domac:~/linux/kernel/linux_stable$ git bisect log
git bisect start
# good: [b6abb62daa5511c4a3eaa30cbdb02544d1f10fa2] Linux 5.15.1
git bisect good b6abb62daa5511c4a3eaa30cbdb02544d1f10fa2
# bad: [e6f4ff3f91251f67b130c29f38673eb5702f88b9] Linux 6.0.3
git bisect bad e6f4ff3f91251f67b130c29f38673eb5702f88b9
# good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
# bad: [1464677662943738741500a6f16b85d36bbde2be] Merge tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
git bisect bad 1464677662943738741500a6f16b85d36bbde2be
# good: [8efd0d9c316af470377894a6a0f9ff63ce18c177] Merge tag '5.17-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 8efd0d9c316af470377894a6a0f9ff63ce18c177
# good: [aaa25a2fa7964d94690f6de5edd7164ca7d76555] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect good aaa25a2fa7964d94690f6de5edd7164ca7d76555
# bad: [b4bc93bd76d4da32600795cd323c971f00a2e788] Merge tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad b4bc93bd76d4da32600795cd323c971f00a2e788
# bad: [ef510682af3dbe2f9cdae7126a1461c94e010967] Merge tag 'f2fs-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
git bisect bad ef510682af3dbe2f9cdae7126a1461c94e010967
# good: [a04b1bf574e1f4875ea91f5c62ca051666443200] Merge tag 'for-5.18/parisc-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good a04b1bf574e1f4875ea91f5c62ca051666443200
# bad: [b080cee72ef355669cbc52ff55dc513d37433600] Merge tag 'for-5.18/io_uring-statx-2022-03-18' of git://git.kernel.dk/linux-block
git bisect bad b080cee72ef355669cbc52ff55dc513d37433600
# good: [02b82b02c34321dde10d003aafcd831a769b2a8a] Merge tag 'pm-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 02b82b02c34321dde10d003aafcd831a769b2a8a
# good: [0e03b8fd29363f2df44e2a7a176d486de550757a] crypto: xilinx - Turn SHA into a tristate and allow COMPILE_TEST
git bisect good 0e03b8fd29363f2df44e2a7a176d486de550757a
# good: [3e504d2026eb6c8762cd6040ae57db166516824a] random: check for signal and try earlier when generating entropy
git bisect good 3e504d2026eb6c8762cd6040ae57db166516824a
# good: [5e929367468c8f97cd1ffb0417316cecfebef94b] io_uring: terminate manual loop iterator loop correctly for non-vecs
git bisect good 5e929367468c8f97cd1ffb0417316cecfebef94b
# bad: [2d6fc1455f3f383499e013ebc4b19ff49c53c15e] Merge branches 'thermal-powerclamp', 'thermal-int340x' and 'thermal-docs'
git bisect bad 2d6fc1455f3f383499e013ebc4b19ff49c53c15e
# good: [1d6aab36a26ba44b114d7f8a857c430c9e0c32c9] thermal/drivers/ti-soc-thermal: Remove unused function ti_thermal_get_temp()
git bisect good 1d6aab36a26ba44b114d7f8a857c430c9e0c32c9
# bad: [c7ff29763989bd09c433f73fae3c1e1c15d9cda4] thermal: int340x: Update OS policy capability handshake
git bisect bad c7ff29763989bd09c433f73fae3c1e1c15d9cda4
# good: [098c874e20be2a4cee3021aa9b3485ed5e1f4d5b] thermal: Replace acpi_bus_get_device()
git bisect good 098c874e20be2a4cee3021aa9b3485ed5e1f4d5b
# good: [668f69a5f863b877bc3ae129efe9a80b6f055141] thermal: int340x: Increase bitmap size
git bisect good 668f69a5f863b877bc3ae129efe9a80b6f055141
# first bad commit: [c7ff29763989bd09c433f73fae3c1e1c15d9cda4] thermal: int340x: Update OS policy capability handshake
You have mail in /var/mail/mtodorov
mtodorov@domac:~/linux/kernel/linux_stable$

I hope this helps. God bless.

spandruvada commented 1 month ago

If you can reproduce with the latest kernel, please enter again.