lkrg-org / lkrg

Linux Kernel Runtime Guard
https://lkrg.org
Other
410 stars 72 forks source link

Add support for RHEL 9.2 kernel #238

Closed wladmis closed 1 year ago

wladmis commented 1 year ago

Reported-by: Alexey Gladkov legion@altlinux.ru Signed-off-by: Vladimir D. Seleznev vseleznv@altlinux.org

solardiz commented 1 year ago

Thanks. Can you show how the problem manifested itself and with what exact kernel revision?

Somehow we didn't see any build failures with CentOS Stream 9, which is one of the GitHub Actions tests, so I guess either the problem isn't a build-time error (but is a runtime error) or the check shouldn't be for RHEL 9.0 (but for some later 9.x).

wladmis commented 1 year ago

Thanks. Can you show how the problem manifested itself and with what exact kernel revision?

https://git.altlinux.org/tasks/309050/logs/events.2.1.log

As I can see, it's 5.14.0.181.

Somehow we didn't see any build failures with CentOS Stream 9, which is one of the GitHub Actions tests, so I guess either the problem isn't a build-time error (but is a runtime error) or the check shouldn't be for RHEL 9.0 (but for some later 9.x).

I don't know how, but it is a build-time error (may be due the difference of the compilers flags setting).

solardiz commented 1 year ago

I'm reluctant to merge this without having better understanding first. What we know is:

  1. This fixes build of LKRG for ALT's build of CentOS Stream 9 kernel revision 5.14.0.181-alt1.el9.
  2. This doesn't break build of LKRG for actual CentOS Stream 9 currently (but probably unnecessarily disables this optional functionality).
  3. The changes are the same as what @Adam-pi3 made in 1b39b4cf9902cb239be08ce30b13c0e312770ed1 for 5.16+.
  4. Just skipping these notifiers should be safe (disabling of optional functionality), so no real concern of runtime issues.

What we don't know is:

  1. Why build didn't break for actual CentOS Stream 9 yet if the same kind of change is in fact required. Does ALT possibly use different kernel configuration settings?
  2. Whether RHEL 9.0 is the right version to check. Should possibly be a higher 9.x, or we could check for PROFILE_TASK_EXIT and/or PROFILE_MUNMAP instead of kernel versions.
  3. When ALT started making such builds. If the builds were being made before and just started failing recently, then we'd want to know what caused that.
solardiz commented 1 year ago

we could check for PROFILE_TASK_EXIT and/or PROFILE_MUNMAP instead of kernel versions.

No luck with that, at least not with a mere #if, because they were not macros:

enum profile_type {
        PROFILE_TASK_EXIT,
        PROFILE_MUNMAP
};
solardiz commented 1 year ago

Whether RHEL 9.0 is the right version to check. Should possibly be a higher 9.x

According to https://access.redhat.com/articles/3078 RHEL 9.0 had 5.14.0-70.13.1. My guess is 5.14.0-181 is a version closer to RHEL 9.1 - possibly it already sets RHEL_RELEASE_CODE to 9.1 and we should check for 9.1+?

Our build for actual CentOS Stream 9 uses 5.14.0-177, so maybe something changed between 177 and 181. Relevantly, that build produces these warnings (not errors):

  CC [M]  /__w/lkrg/lkrg/src/modules/notifiers/p_notifiers.o
/__w/lkrg/lkrg/src/modules/notifiers/p_notifiers.c:93:30: warning: 'p_profile_event_munmap_notifier_nb' defined but not used [-Wunused-variable]
   93 | static struct notifier_block p_profile_event_munmap_notifier_nb = {
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/lkrg/lkrg/src/modules/notifiers/p_notifiers.c:89:30: warning: 'p_profile_event_exit_notifier_nb' defined but not used [-Wunused-variable]
   89 | static struct notifier_block p_profile_event_exit_notifier_nb = {
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/__w/lkrg/lkrg/src/modules/notifiers/p_notifiers.c:85:30: warning: 'p_taskfree_notifier_nb' defined but not used [-Wunused-variable]
   85 | static struct notifier_block p_taskfree_notifier_nb = {
      |                              ^~~~~~~~~~~~~~~~~~~~~~

I am puzzled by this since we define those variables within the same kind of #if conditions where we'd use them:

#if LINUX_VERSION_CODE < KERNEL_VERSION(5,16,0)
static struct notifier_block p_taskfree_notifier_nb = {
   .notifier_call = p_taskfree_notifier,
};

static struct notifier_block p_profile_event_exit_notifier_nb = {
   .notifier_call = p_profile_event_exit_notifier,
};

static struct notifier_block p_profile_event_munmap_notifier_nb = {
   .notifier_call = p_profile_event_munmap_notifier,
};
#endif
#if LINUX_VERSION_CODE < KERNEL_VERSION(5,16,0)
   task_handoff_register(&p_taskfree_notifier_nb);
   profile_event_register(PROFILE_TASK_EXIT, &p_profile_event_exit_notifier_nb);
   profile_event_register(PROFILE_MUNMAP, &p_profile_event_munmap_notifier_nb);
#endif

and there's no nested #if between these blocks. So I'd expect that either both are compiled in or both are excluded, but the warnings suggest that only the second block somehow got skipped.

OK, here's a guess: maybe task_handoff_register and profile_event_register become dummy macros (that don't use their arguments) with whatever kernel configuration settings CentOS Stream 9 uses (but ALT's configuration is different).

solardiz commented 1 year ago

maybe task_handoff_register and profile_event_register become dummy macros (that don't use their arguments) with whatever kernel configuration settings CentOS Stream 9 uses (but ALT's configuration is different).

Actually, they don't have to be macros to produce those warnings. Without CONFIG_PROFILING, they are:

static inline int task_handoff_register(struct notifier_block * n)
{
        return -ENOSYS;
}

static inline int task_handoff_unregister(struct notifier_block * n)
{
        return -ENOSYS;
}

static inline int profile_event_register(enum profile_type t, struct notifier_block * n)
{
        return -ENOSYS;
}

static inline int profile_event_unregister(enum profile_type t, struct notifier_block * n)
{
        return -ENOSYS;
}

We'll probably want to add defined(CONFIG_PROFILING) && to those checks so that we don't have unused variables.

solardiz commented 1 year ago

@Adam-pi3 I think it'd be best to drop those pieces of code altogether. They depend on old kernels, CONFIG_PROFILING (are otherwise no-op), and provide just some more "random events". I think they're not worth the effort to maintain going forward. In particular, not worth the effort to complicate the compile-time checks now. OK to proceed to drop them?

Adam-pi3 commented 1 year ago

Hm... It looks like if we add additional check for CONFIG_PROFILING we should be fine so maybe we should keep it and add additional verification? We could drop it as well, but regardless which path we go I guess we have bigger question for adding more random and unpredictable events. @wladmis maybe you would like to research this area?

wladmis commented 1 year ago

I'm reluctant to merge this without having better understanding first. What we know is:

1. This fixes build of LKRG for ALT's build of CentOS Stream 9 kernel revision 5.14.0.181-alt1.el9.

2. This doesn't break build of LKRG for actual CentOS Stream 9 currently (but probably unnecessarily disables this optional functionality).

3. The changes are the same as what @Adam-pi3 made in [1b39b4c](https://github.com/lkrg-org/lkrg/commit/1b39b4cf9902cb239be08ce30b13c0e312770ed1) for 5.16+.

4. Just skipping these notifiers should be safe (disabling of optional functionality), so no real concern of runtime issues.

What we don't know is:

1. Why build didn't break for actual CentOS Stream 9 yet if the same kind of change is in fact required. Does ALT possibly use different kernel configuration settings?

AFAIK Alexey strives to keep the ALT centos-kernel configuration as close as possible to CentOS Stream's.

2. Whether RHEL 9.0 is the right version to check. Should possibly be a higher 9.x, or we could check for `PROFILE_TASK_EXIT` and/or `PROFILE

3. When ALT started making such builds. If the builds were being made before and just started failing recently, then we'd want to know what caused that.

As I can see, the previous build of the module was v0.9.3-41-gcbd4198 against 5.14.0.177.

Alexey just found commit, where profile_handoff_task is removed, but it is supposed for rhel9.2. I'll redo the PR to check against 9.2 then.

Hm... It looks like if we add additional check for CONFIG_PROFILING we should be fine so maybe we should keep it and add additional verification? We could drop it as well, but regardless which path we go I guess we have bigger question for adding more random and unpredictable events. @wladmis maybe you would like to research this area?

I'm not sure I can make it done quickly, but I can try.

solardiz commented 1 year ago

@wladmis Thank you for amending this commit. Does it still make build succeed for you (on 5.14.0.181-alt1.el9)?

wladmis commented 1 year ago

On Mon, Oct 31, 2022 at 08:11:37AM -0700, Solar Designer wrote:

@wladmis Thank you for amending this commit. Does it still make build succeed for you (on 5.14.0.181-alt1.el9)?

Yes, it does. There are still warnings but runtime tests passed.

-- WBR, Vladimir D. Seleznev

solardiz commented 1 year ago

There are still warnings but runtime tests passed.

I assume we need to keep the check on line 84 consistent with those changed by this PR, and to add defined(CONFIG_PROFILING) && to all of them.

solardiz commented 1 year ago

@wladmis Your latest changes look good to me, thanks! However, somehow the warnings in our CentOS Stream 9 test build are not gone. We need to figure this out.

Also, somehow this failed the mkosi-mainline test now, but I guess that's unrelated to the changes here. https://github.com/lkrg-org/lkrg/actions/runs/3362987970/jobs/5575472489

solardiz commented 1 year ago

Thank you for your latest changes, @wladmis. No more warnings. I've also verified that the code still does compile in on some systems - tested on Alma Linux 8.5, also without warnings. I'll merge.

solardiz commented 1 year ago

BTW, I'm a bit puzzled it's RHEL 9.2 kernel already, when 9.1 is apparently still not fully released (is in beta), but maybe that's part of the lifecycle changes in CentOS Stream 9 and RHEL 9, as compared to previous major versions of RHEL (where I don't recall such early use of a +2 minor release number).

wladmis commented 1 year ago

On Mon, Oct 31, 2022 at 01:30:55PM -0700, Solar Designer wrote:

BTW, I'm a bit puzzled it's RHEL 9.2 kernel already, when 9.1 is apparently still not fully released (is in beta), but maybe that's part of the lifecycle changes in CentOS Stream 9 and RHEL 9, as compared to previous major versions of RHEL (where I don't recall such early use of a +2 minor release number).

https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/blob/main/Makefile.rhelver#L1

Yes, this really seems a bit complicated.