zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.03k stars 6.17k forks source link

APIC timer does not support SMP #34788

Closed jenmwms closed 3 years ago

jenmwms commented 3 years ago

Describe the bug There is a protective build assert on the APIC timer driver, which is likely no longer needed. BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP), "APIC timer doesn't support SMP");

To Reproduce Steps to reproduce the behavior:

  1. Configure for SMP, such as tests/kernel/interrupt:
    diff --git a/tests/kernel/interrupt/prj.conf b/tests/kernel/interrupt/prj.conf
    index e4ba089e3b..a14d4c8fbf 100644
    --- a/tests/kernel/interrupt/prj.conf
    +++ b/tests/kernel/interrupt/prj.conf
    @@ -2,3 +2,4 @@ CONFIG_ZTEST=y
    CONFIG_DYNAMIC_INTERRUPTS=y
    CONFIG_MP_NUM_CPUS=1
    CONFIG_THREAD_STACK_INFO=y
    +CONFIG_SMP=y
  2. west build -p -b acrn_ehl_crb zephyr/tests/kernel/interrupt
  3. See error

Expected behavior Build successfully and use CONFIG_MP_NUM_CPUs >1 without issue, platform-wide tests. This is just example.

Impact cannot use SMP with APIC timer. showstopper.

**Logs and console output**
In file included from /home/jennife2/zephyrprojectGH/zephyr/include/toolchain.h:43,
                 from /home/jennife2/zephyrprojectGH/zephyr/include/init.h:10,
                 from /home/jennife2/zephyrprojectGH/zephyr/include/device.h:29,
                 from /home/jennife2/zephyrprojectGH/zephyr/include/drivers/timer/system_timer.h:19,
                 from /home/jennife2/zephyrprojectGH/zephyr/drivers/timer/apic_timer.c:6:
/home/jennife2/zephyrprojectGH/zephyr/include/toolchain/gcc.h:62:36: error: static assertion failed: "APIC timer doesn\'t support SMP"
   62 | #define BUILD_ASSERT(EXPR, MSG...) _Static_assert(EXPR, "" MSG)
      |                                    ^~~~~~~~~~~~~~
/home/jennife2/zephyrprojectGH/zephyr/drivers/timer/apic_timer.c:11:1: note: in expansion of macro 'BUILD_ASSERT'
   11 | BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP), "APIC timer doesn't support SMP");
      | ^~~~~~~~~~~~
[51/118] Building C object zephyr/arch/a...akeFiles/arch__x86__core.dir/fatal.c.ob
ninja: build stopped: subcommand failed.
FATAL ERROR: command exited with status 1: /usr/bin/cmake --build /home/jennife2/zephyrprojectGH/build

Environment (please complete the following information):

Additional context This was my preliminary attempt to enable, but got stuck at a strange build error where it said undefined reference to smp_timer_init with collect 2: error: ld returned 1 exit status. I confirmed CONFIG_SMP is set y, so the ifdef should be fine in the sm.c. So close!

diff --git a/boards/x86/acrn/acrn_ehl_crb_defconfig b/boards/x86/acrn/acrn_ehl_crb_defconfig
index c19908455e..83e7dbd075 100644
--- a/boards/x86/acrn/acrn_ehl_crb_defconfig
+++ b/boards/x86/acrn/acrn_ehl_crb_defconfig
@@ -16,3 +16,4 @@ CONFIG_SHELL_BACKEND_SERIAL_INTERRUPT_DRIVEN=n
 CONFIG_KERNEL_VM_SIZE=0x1000000
 CONFIG_BUILD_NO_GAP_FILL=y
 CONFIG_APIC_TIMER_IRQ=48
+CONFIG_MP_NUM_CPUS=2
diff --git a/drivers/timer/apic_timer.c b/drivers/timer/apic_timer.c
index 9a803f2d42..bbb46a3359 100644
--- a/drivers/timer/apic_timer.c
+++ b/drivers/timer/apic_timer.c
@@ -8,8 +8,6 @@
 #include <spinlock.h>
 #include <drivers/interrupt_controller/loapic.h>

-BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP), "APIC timer doesn't support SMP");
-
 /*
  * Overview:
  *
diff --git a/soc/x86/ia32/Kconfig.defconfig b/soc/x86/ia32/Kconfig.defconfig
index 87600183a2..c52d937249 100644
--- a/soc/x86/ia32/Kconfig.defconfig
+++ b/soc/x86/ia32/Kconfig.defconfig
@@ -11,4 +11,7 @@ config SOC
 config SYS_CLOCK_HW_CYCLES_PER_SEC
        default 25000000  if HPET_TIMER

+config SCHED_IPI_SUPPORTED
+       default y
+
 endif

(CPU =2 for testing)

andyross commented 3 years ago

So.. that error indeed looks wrong. I don't see anything SMP-unsafe about that driver.

smp_timer_init() is just a hook so that platforms can do per-CPU timer driver initialization. For APIC is should just be a noop. This patch below is enough to get me a working build on acrn_ehl_crb.

Then the question becomes "does it work?". I don't have a setup for running ACRN on EHL (and I don't see any docs checked in that would detail that, is there somewhere else I should look?), but it should certainly be possible to run CONFIG_APIC_TIMER=y on qemu_x86_64. And indeed I get a hang there in tests/kernel/common. It's probably not something terrible, I'll take a look.

diff --git a/boards/x86/acrn/acrn_ehl_crb_defconfig b/boards/x86/acrn/acrn_ehl_crb_defconfig
index c19908455e..4b1c299941 100644
--- a/boards/x86/acrn/acrn_ehl_crb_defconfig
+++ b/boards/x86/acrn/acrn_ehl_crb_defconfig
@@ -16,3 +16,5 @@ CONFIG_SHELL_BACKEND_SERIAL_INTERRUPT_DRIVEN=n
 CONFIG_KERNEL_VM_SIZE=0x1000000
 CONFIG_BUILD_NO_GAP_FILL=y
 CONFIG_APIC_TIMER_IRQ=48
+CONFIG_SMP=y
+CONFIG_MP_NUM_CPUS=2
\ No newline at end of file
diff --git a/drivers/timer/apic_timer.c b/drivers/timer/apic_timer.c
index 9a803f2d42..30da19c78f 100644
--- a/drivers/timer/apic_timer.c
+++ b/drivers/timer/apic_timer.c
@@ -8,8 +8,6 @@
 #include <spinlock.h>
 #include <drivers/interrupt_controller/loapic.h>

-BUILD_ASSERT(!IS_ENABLED(CONFIG_SMP), "APIC timer doesn't support SMP");
-
 /*
  * Overview:
  *
@@ -240,3 +238,8 @@ int sys_clock_driver_init(const struct device *dev)

        return 0;
 }
+
+void smp_timer_init(void)
+{
+       /* empty */
+}
jenmwms commented 3 years ago

Hi Andy! Thanks for taking a look!

don't have a setup for running ACRN on EHL (and I don't see any docs checked in that would detail that, is there somewhere else I should look?)

I can hep test on ACRN on EHL. The docs need to be improved (or added in general), its on the To Do list. Please work with me offline to learn more until we get it published.

but it should certainly be possible to run CONFIG_APIC_TIMER=y on qemu_x86_64. And indeed I get a hang there in tests/kernel/common. It's probably not something terrible, I'll take a look.

Great catch. Please let me know how else I can help!

andyross commented 3 years ago

OK, I got it. Or I got something anyway[1]. The APIC CCR counter registers aren't synchronized between CPUs like the TSC is, even though they're actually architecturally fixed to the TSC counter via a divider. The language in the SDM that I (only vaguely) remembered says that they are reset on CPU initialization, which sounds like it should be fine for us since we never bother to set it. Except that SMP startup on x86 is implemented as a CPU reset that zeroes CCR! So the CPUs will always be off from CPU0 by however long it took to reach their initialization code.

This is fixable: we need to compute a per-CPU offset based on the TSC and reference that value and not the bare CCR number when running in SMP. And for correctness we need to make sure that this is only done on systems with an invariant TSC (though we can probably do that with docs vs. asserting at runtime vs. a CPUID leaf). And then make sure this degrades cleanly to a noop when CONFIG_SMP=n. Probably not a big mess, but I need to spend a few hours staring at the driver for sure.

[1] Actually two things. My language about about making smp_timer_init() a noop was wildly wrong. The APIC is a per-CPU device, so you totally have to repeat the setup on the second CPU. But that part was an easy fix.

andyross commented 3 years ago

Alternatively, given that this is only needed for multicore x86_64 hardware, it might actually be easiest to write a second (er, third!) APIC driver that exploits the TSC deadline mode of the APIC, which is a much cleaner interface that avoids exactly this problem[1]. But this is a "only in the last decade" kind of feature, and weird hardware doesn't support it. I know Quark did not, and it's not unlikely that its Lakemont descendants don't either. But neither do any of those platforms need APIC on SMP...

[1] And also makes the "minimum deadline settable" logic go away, as the value written is an absolute time with a proper comparator, so the CPU can understand setting a time "in the past" and do the right thing).

jenmwms commented 3 years ago

In case it helps, but a note on acrn_ehl_crb APIC timer TSC M and N values, which is not yet configured....I think I found it: https://github.com/zephyrproject-rtos/zephyr/issues/32603#issuecomment-831482651. I'll submit a PR so you'll be working with it. CONFIG_APIC_TIMER_TSC_M=3 CONFIG_APIC_TIMER_TSC_N=249

andyross commented 3 years ago

I'm not even going to think about touching the hypervisor on the dev-board-in-remote-lab until this works first in Qemu and then on the up2 board on my desk. Always make software work on the simple platforms before looking at the weird ones.

galak commented 3 years ago

@jenmwms - I don't see why this is a bug / high vs being an enhancement?

andyross commented 3 years ago

Uh... well, I guess the feature request to use this particular driver on this particular platform is an enhancement, but the actual behavior of this code vs. a second CPU's APIC is totally a bug that should be fixed. How's that?

jenmwms commented 3 years ago

@nashif can you please answer @galak on why this is a bug not enhancement?