osandov / drgn

Programmable debugger
Other
1.72k stars 157 forks source link

Test failure on 6.1.98-vmtest30.0tiny #415

Closed brenns10 closed 1 month ago

brenns10 commented 1 month ago

I'm getting this on the main branch for 6.1.98-vmtest30.0tiny:

======================================================================
ERROR: test_identify_slab_cache (tests.linux_kernel.helpers.test_common.TestIdentifyAddress) (size='small')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stepbren/repos/drgn/tests/linux_kernel/helpers/test_common.py", line 45, in test_identify_slab_cache
    self.assertIsNone(identify_address(obj))
  File "/home/stepbren/repos/drgn/drgn/helpers/common/prog.py", line 113, in wrapper
    return f(args[0].prog_, *args, **kwds)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 198, in identify_address
    return _identify_kernel_address(prog, addr, cache)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 132, in _identify_kernel_address
    slab_info = _get_slab_cache_helper(slab.slab_cache).object_info(
AttributeError: 'struct slab' has no member 'slab_cache'

======================================================================
ERROR: test_print_annotated_memory (tests.linux_kernel.helpers.test_common.TestPrintAnnotatedMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stepbren/repos/drgn/tests/linux_kernel/helpers/test_common.py", line 90, in test_print_annotated_memory
    print_annotated_memory(
  File "/home/stepbren/repos/drgn/drgn/helpers/common/prog.py", line 111, in wrapper
    return f(*args, **kwds)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 255, in print_annotated_memory
    identified = identify_address(prog, value, cache=cache)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/prog.py", line 111, in wrapper
    return f(*args, **kwds)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 198, in identify_address
    return _identify_kernel_address(prog, addr, cache)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 132, in _identify_kernel_address
    slab_info = _get_slab_cache_helper(slab.slab_cache).object_info(
AttributeError: 'struct slab' has no member 'slab_cache'

======================================================================
ERROR: test_print_annotated_stack (tests.linux_kernel.helpers.test_common.TestPrintAnnotatedStack)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stepbren/repos/drgn/tests/linux_kernel/helpers/test_common.py", line 107, in test_print_annotated_stack
    print_annotated_stack(trace)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/stack.py", line 132, in print_annotated_stack
    identified = identify_address(prog, word_val, cache=cache)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/prog.py", line 111, in wrapper
    return f(*args, **kwds)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 198, in identify_address
    return _identify_kernel_address(prog, addr, cache)
  File "/home/stepbren/repos/drgn/drgn/helpers/common/memory.py", line 132, in _identify_kernel_address
    slab_info = _get_slab_cache_helper(slab.slab_cache).object_info(
AttributeError: 'struct slab' has no member 'slab_cache'

----------------------------------------------------------------------
Ran 290 tests in 1.108s

FAILED (errors=3, skipped=20)

It happened on my fork's CI somehow and I've verified it locally. Looking into it now but figured I'd post an issue for tracking.

brenns10 commented 1 month ago

So this isn't surprising and I don't know why it took me a while to realize that the "tiny" configurations use SLOB, which doesn't have the slab_cache pointer. By design.

But I guess I'm confused about why the test fails now. Did we just never notice it?

brenns10 commented 1 month ago

vmtest build 205 PASSED on drgn 8ab4901d8336faf74bdd5f0b6bf8127d87d818f9, but it didn't run the 6.1.y tiny configuration, it ran the 6.1.y default configuration. Here is what it ran from one of the logs: it did not run against any tiny configurations.

Passed: local, 6.10.0-rc7-vmtest30.1default, 6.9.8-vmtest30.1default, 6.8.12-vmtest30.1default, 6.7.12-vmtest30.1default, 6.6.37-vmtest30.1default, 6.5.13-vmtest30.1default, 6.4.16-vmtest30.1default, 6.3.13-vmtest30.1default, 6.2.16-vmtest30.1default, 6.1.97-vmtest30.1default, 6.0.19-vmtest30.1default, 5.19.17-vmtest30.1default, 5.18.19-vmtest30.1default, 5.17.15-vmtest30.1default, 5.16.20-vmtest30.1default, 5.15.162-vmtest30.1default, 5.14.21-vmtest30.1default, 5.13.19-vmtest30.1default, 5.12.19-vmtest30.1default, 5.11.22-vmtest30.1default, 5.10.221-vmtest30.1default, 5.4.279-vmtest30.1default, 4.19.317-vmtest30.1default, 4.14.336-vmtest30.1default, 4.9.337-vmtest30.1default

vmtest build 206 FAILED on the current main, de8ba223defa8dc70c44385e230cf2b13623f7cd, presumably because it did run the 6.1.y tiny configuration.

Failed: 6.1.98-vmtest30.0tiny
Passed: local, 6.10.0-vmtest30.1default, 6.9.9-vmtest30.0tiny, 6.8.12-vmtest30.1default, 6.7.12-vmtest30.1default, 6.6.39-vmtest30.0tiny, 6.5.13-vmtest30.1default, 6.4.16-vmtest30.1default, 6.3.13-vmtest30.1default, 6.2.16-vmtest30.1default, 6.0.19-vmtest30.1default, 5.19.17-vmtest30.1default, 5.18.19-vmtest30.1default, 5.17.15-vmtest30.1default, 5.16.20-vmtest30.1default, 5.15.162-vmtest30.1default, 5.14.21-vmtest30.1default, 5.13.19-vmtest30.1default, 5.12.19-vmtest30.1default, 5.11.22-vmtest30.1default, 5.10.221-vmtest30.1default, 5.4.279-vmtest30.1default, 4.19.317-vmtest30.1default, 4.14.336-vmtest30.1default, 4.9.337-vmtest30.1default

The vmtest is running with -K which is not supposed to run the tiny configurations... That's odd.

I now notice that the 6.1.98 kernel has not even been built for the default configuration, only the tiny. It appears that the test runner will use the latest 6.1 kernel version (6.1.98) and since there's no default build, it falls back to the tiny! I guess when the build job ran, the 6.1.98 kernel hadn't been tagged yet, but by the time the tiny build ran, it had?

This explains why the test ran suddenly. But also, the test shouldn't have failed, right?

osandov commented 1 month ago

Your guess sounds right to me. I'll kick off a manual vmtest build to unblock this and fix setup.py to explicitly choose the default flavor.

I'm not sure why the test is failing, though. I'm fairly certain I did a manual run on all flavors before cutting 0.0.27, but maybe I forgot to do that.

brenns10 commented 1 month ago

Ok, git bisect blames c7717280ad2f867cefc62e9e841dc0787b08d39b, and indeed, the newly added print_annotated_memory() test fails on the 6.1.98-vmtest30.0tiny. However, the test failure is different:

======================================================================
FAIL: test_print_annotated_memory (tests.linux_kernel.helpers.test_common.TestPrintAnnotatedMemory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/stepbren/repos/drgn/tests/linux_kernel/helpers/test_common.py", line 77, in test_print_annotated_memory
    self.assertIn("slab object: drgn_test_small+0x0", f.getvalue())
AssertionError: 'slab object: drgn_test_small+0x0' not found in 'ADDRESS           VALUE\nffffffffa021b7e0: ffff8880035d8710\nffffffffa021b7e8: ffff8880035d8740\nffffffffa021b7f0: ffff8880035d8770\nffffffffa021b7f8: ffff8880035d87a0\nffffffffa021b800: ffff8880035d87d0\n'

----------------------------------------------------------------------

That makes sense, the test doesn't check whether slob is active, so there will be no slab output in there.

When I run tests on 6.1.98-vmtest30.0tiny against that commit's parent, 1232a404c65c29944de05f8cd7d1c7b6908cda99, the tests do pass.

Now, looking at 48e0c51b0a0e85337595fe6bd598f080850e2f44, I see that on that commit there are the same three failures we saw in the original message. But on its parent, it's just the one. So it's this commit which actually introduces the three failures for fetching the slab_cache field on SLOB.

So there are two issues to fix:

  1. Fix the print_annotated_memory() function to check for SLOB and not assert that slab objects will appear in that case.
  2. Fix the refactored _identify_address() to avoid failing on SLOB.
osandov commented 1 month ago

Oof, clearly I did not actually run the tests on the tiny flavor, then... Another followup will be to test all of the kernel flavors in CI at least some of the time. Maybe as part of the weekly vmtest build and on-demand with a test-all-kernel-flavors tag like we have for Python versions.

brenns10 commented 1 month ago

Oof, clearly I did not actually run the tests on the tiny flavor, then... Another followup will be to test all of the kernel flavors in CI at least some of the time. Maybe as part of the weekly vmtest build and on-demand with a test-all-kernel-flavors tag like we have for Python versions.

Yeah, that would be a great option to have.

Regarding 48e0c51b0a0e85337595fe6bd598f080850e2f44, the commit message reads:

    helper.common.memory: recognize vmap addresses in identify_address()

    This requires rearranging a couple of things: symbols from modules are
    also vmap addresses, so we need to check for symbols first, and we need
    to open-code slab_object_info to avoid some redundant operations

I guess the open-coding of slab_object_info() was the bit that resulted in the regression. I think the main thing that open-coding does is avoid re-testing for in_direct_map. I think we can avoid that by shifting the check out into find_containing_slab() from _find_containing_slab(), and then we can drop the open-coding. In fact, a helper named in_direct_map could be helpful anyway, and then we can simplify everything and make the tests pass too :)

osandov commented 1 month ago

Yeah, the direct map check is the only potential redundancy I see looking at it again, so that plan sounds good to me. Were you planning on working on this? No pressure, your investigation was already plenty helpful, I appreciate it!

brenns10 commented 1 month ago

Yeah, I am partway through it, so I can send a PR :)