Open jharris-intel opened 3 years ago
@andyross FYI
For clarity: the Zephyr core ID is really nothing but an index within the _kernel.cpus[]
array. There is no expectation that it correspond to any particular hardware feature. Things like x86 APIC IDs have an even more complicated architecture, and IIRC the CPU hardware ID values for the two CPUs on esp32 are essentially random-looking 32 bit numbers (though the intel_adsp parts, based on largely the same xtensa IP, use sequential indexes). The whole field is a mess, thus why we have our own IDs.
If this is needed in a low-latency way, I'd suggest adding a arch-specific substruct to the CPU record just like we do for struct thread_base (we don't have one now simply because we haven't needed one) and store the ID in that. And if there's a need for some kind of reverse lookup to map hardware ID to Zephyr index, that can best be done at the architecture level (IIRC x86_64 has something like this already).
But even then, note that the reason you'd need to track the more complicated hardware IDs is probably because you want to do something with the resulting cluster/affinity/whatever graph. And note that Zephyr has almost no support for any of that. We treat all SMP processors 100% symmetrically. We don't understand things like hyperthreading, for example. We have only the most minimal idea of interrupt affinity, our IPIs are expected to go to all CPUs at once, etc...
Oh, and upleveling a bit I see where some of the disconnect is here. The low level code in the PR is using the (oddly sparse) MPIDR_EL register as an index within a CPU array, which is the bug you're looking at. But the Zephyr API it's implementing, arch_curr_cpu()
doesn't care about any of that. All it's supposed to do is return a per-CPU pointer to the CPU record in question, set up in a reasonably fast way however the arch layer wants.
So you can see that on x86, it's implemented as a record pointed to by the %GS segment descriptor. On Xtensa, it's sorted in one of several different special registers (either MISC or one of the EXCSAVEn, depending on what the particular instantiation has available), etc...
So what you're really looking for is not a way to map that ID to a smaller index, but a way to stash a pointer away such that routine kernel code won't mess it up. Some quick googling (note that I am not the ARM expert on the project!) turns up a family of TPIDR_* registers ("thread pointer" I guess?) which seem to be intended for more or less exactly this purpose. Maybe those are what we want to be looking at?
But even then, note that the reason you'd need to track the more complicated hardware IDs is probably because you want to do something with the resulting cluster/affinity/whatever graph.
Unfortunately, not really. It's a combination of two things:
gic_raise_sgi
. As a result, I do need the reverse map available. (I could add MPIDR values within the cpu_t struct itself I suppose.)Some quick googling (note that I am not the ARM expert on the project!) turns up a family of TPIDR_* registers ("thread pointer" I guess?) which seem to be intended for more or less exactly this purpose.
I considered that as an alternative above ("You could use TPIDR_EL0 / TPIDRRO_EL0, but other things in the system potentially use them. (And they are generally considered to be thread ID, not core ID.)"). I'm lothe to use them because e.g. some thread-local-storage implementations use them for, well, a pointer for thread-local storage.
Also, this doesn't solve the problem, just moves it. TPIDR_EL0
/ TPIDRRO_EL0
are really just per-core registers (one of which is always writable, and one of which is writable only from higher privilege levels) that we'd have to set up. And if I could set up TPIDR_EL0
, I could use the same logic directly to get the linear core ID.
I guess the question here becomes "on the architectures where "CPU hardware ID values for the two CPUs on esp32 are essentially random-looking 32 bit numbers", how do you set up the initial linear IDs?" / On x86,how do you set up the segment descriptor such that every CPU gets a unique linear ID? / On Xtensa how do you set up the special registers such that every CPU gets a unique linear ID?
I suppose I could:
TPIDRRO_EL0
(I'd say TPIDR_EL1
, but arch_is_in_isr
directly calls arch_curr_cpu
, which could be a problemin userspace mode).
...although I dislike the combination here of:
arch_curr_cpu() doesn't care about any of that.
Sure, if there's a different method of getting a pointer to a unique cpu_t for each CPU I'm all ears.
I still think you're overthinking this. On x86, for example, there's a compiled-in table (probably should be in DT, but it's just a C array) of APIC IDs that gets searched linearly in arch_mp_start_cpu() (or maybe on the CPU that just got launched, I forget) to identify the CPU being started. Then it just stuffs the pointer to that CPU record into %GS on the initialization path and never touches it again. Same deal on esp32 really, though there because there are only two CPUs the hardware ID isn't involved and so the auxilliary CPU knows it should put &_kernel.cpus[1]
in the MISC register.
As far as register choice: yes, we need one register for TLS, but just one. The read-only TPIDR looks to me like a perfectly acceptable home for the CPU pointer (we can't use the R/W one because you couldn't trust it in a syscall and would need to recover the CPU record some other way for security reasons, defeating the purpose).
Hi @carlocaione,
This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.
Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.
@jharris-intel you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.
Thanks!
Is your enhancement proposal related to a problem? Please describe.
Zephyr assumes that core IDs are within a restricted range, 0..
CONFIG_MP_NUM_CPUS
-1. A number of data structures in Zephyr scale linearly in size w.r.t.CONFIG_MP_NUM_CPUS
.ARM's core ID is in the form of 4 affinity values, each a byte, with a given core identified by the unique tuple (
aff3
,aff2
,aff1
,aff0
). Usuallyaff0
gives the core ID within a cluster andaff1
gives the cluster ID, although this is not guaranteed. (e.g. if you had a (hypothetical) large system with two sockets each with multiple clusters of ARM cores with hyperthreading you might haveaff0
be the hyperthread,aff1
be the core ID,aff2
be the cluster ID within the socket, andaff3
be the socket id... although I don't know of any ARM systems that do this as of yet.) The only thing "guaranteed" by spec is that this gives some indication as to scheduling.For instance, a CPU with two clusters of two cores each each might have the following
mpidr_el1
.aff2/1/0/ values for the various cores:As a result, the current AArch64 code does not support multiple CPU clusters, or rather does not support non-contiguous
mpidr_el1
values in general. (At the very least without settingCONFIG_MP_NUM_CPUS
to a very large value and wasting a lot of memory as a result... and that has other issues.)Describe the solution you'd like
Two components, one for the "common" case of symmetric affinities, one override for the non-common case.
For the common case of symmetric affinities (note: I am terrible at naming things):
CONFIG_MP_NUM_CPUS
on AArch64 maps toCONFIG_NUM_AFF0_VALUES
, withCONFIG_NUM_AFF1_VALUES
/CONFIG_NUM_AFF2_VALUES
/CONFIG_NUM_AFF3_VALUES
all set to 1.mpidr_el1
for getting core ID to, roughly:(Pardon the "cute" but rather non-obvious code in the above. It is designed to handle someone setting CONFIG_NUM_AFF_VALUES to zero or one, while still producing decent code - and I know that the above doesn't properly handle undefined CONFIG_NUM_AFF_VALUES.)
This covers the common case, and is reasonably efficient in translating from mpidr_el1 values to linear core IDs.
In the example, for instance, you would set CONFIG_NUM_AFF0_VALUES = 2, CONFIG_NUM_AFF1_VALUES=2, and the rest 0 (or 1), which would result in (after const-prop and assuming asserts are disabled):
...which is pretty close to optimal. (Actual optimal would be if you could replace the two byte-masks with one, but meh.)
For the uncommon case (e.g. a cluster of 4 and a cluster of 2, or more likely e.g. two clusters of 4 where 2 cores are used for other things), I suggest adding a CONFIG override indicating that the platform supplies a function to get the linear core ID.
(Honestly, it may not be worth it to add the complexity for the former so long as the latter is supported.)
Both of these would also need a change for SGIs (and GIC in general) to map back from linear core ID to affinity ID, although this is relatively straightforward (because the linear core ID is, well, linear, worse comes to worse you can just store an array of affinity IDs set up during early boot.)
Describe alternatives you've considered
Use some per-cpu register to store the current linear core ID, and during startup initialize it somehow. However, I couldn't find any suitable registers. You could use
VMPIDR_EL2
, except that you're not guaranteed that it's even implemented. You could useTPIDR_EL0
/TPIDRRO_EL0
, but other things in the system potentially use them. (And they are generally considered to be thread ID, not core ID.)Do whatever AArch32 does? I haven't really looked too much at this.
Additional context
This is not required in the short term, and is largely dependent on #30676, but would be helpful to have upstream support for (even if it's best-effort).
I am fully willing to do the initial work for this - but I'd prefer to have someone poke at any obvious holes in the high-level approach before I spend the time to implement it :-)