intel / intel-cmt-cat

User space software for Intel(R) Resource Director Technology
http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html
Other
693 stars 183 forks source link

Significant overhead on non-throttled hyper-thread core #219

Closed Msiavashi closed 2 years ago

Msiavashi commented 2 years ago

I have a two-socket Xeon Gold 6142 on which core id 1 and 33 are two sibling hyper threads sharing the same physical core.

When I throttle the bandwidth to maximum (10%) on core 1, and execute a benchmark on both hyper threads, the execution time of the benchmark running on core 33 (sibling hyper thread) increases drastically (~9x). While the same benchmark executes ~9x faster when core 1 is idle but throttled.

Why there is such a significant performance degradation?

Please note that I'm also setting CAT but it doesn't affect the performance as much as throttling does.

Here is the output of pqos -s:

3CA/MBA COS definitions for Socket 0:
    L3CA COS0 => MASK 0x7ff
    L3CA COS1 => MASK 0x1
    L3CA COS2 => MASK 0x7fe
    L3CA COS3 => MASK 0x7ff
    L3CA COS4 => MASK 0x7ff
    L3CA COS5 => MASK 0x7ff
    L3CA COS6 => MASK 0x7ff
    L3CA COS7 => MASK 0x7ff
    MBA COS0 => 100% available
    MBA COS1 => 10% available
    MBA COS2 => 100% available
    MBA COS3 => 100% available
    MBA COS4 => 100% available
    MBA COS5 => 100% available
    MBA COS6 => 100% available
    MBA COS7 => 100% available
L3CA/MBA COS definitions for Socket 1:
    L3CA COS0 => MASK 0x7ff
    L3CA COS1 => MASK 0x1
    L3CA COS2 => MASK 0x7fe
    L3CA COS3 => MASK 0x7ff
    L3CA COS4 => MASK 0x7ff
    L3CA COS5 => MASK 0x7ff
    L3CA COS6 => MASK 0x7ff
    L3CA COS7 => MASK 0x7ff
    MBA COS0 => 100% available
    MBA COS1 => 10% available
    MBA COS2 => 100% available
    MBA COS3 => 100% available
    MBA COS4 => 100% available
    MBA COS5 => 100% available
    MBA COS6 => 100% available
    MBA COS7 => 100% available
Core information for socket 0:
    Core 0, L2ID 0, L3ID 0 => COS0
    Core 2, L2ID 7, L3ID 0 => COS0
    Core 4, L2ID 1, L3ID 0 => COS0
    Core 6, L2ID 6, L3ID 0 => COS0
    Core 8, L2ID 2, L3ID 0 => COS0
    Core 10, L2ID 5, L3ID 0 => COS0
    Core 12, L2ID 3, L3ID 0 => COS0
    Core 14, L2ID 4, L3ID 0 => COS0
    Core 16, L2ID 8, L3ID 0 => COS0
    Core 18, L2ID 15, L3ID 0 => COS0
    Core 20, L2ID 9, L3ID 0 => COS0
    Core 22, L2ID 14, L3ID 0 => COS0
    Core 24, L2ID 10, L3ID 0 => COS0
    Core 26, L2ID 13, L3ID 0 => COS0
    Core 28, L2ID 11, L3ID 0 => COS0
    Core 30, L2ID 12, L3ID 0 => COS0
    Core 32, L2ID 0, L3ID 0 => COS0
    Core 34, L2ID 7, L3ID 0 => COS0
    Core 36, L2ID 1, L3ID 0 => COS0
    Core 38, L2ID 6, L3ID 0 => COS0
    Core 40, L2ID 2, L3ID 0 => COS0
    Core 42, L2ID 5, L3ID 0 => COS0
    Core 44, L2ID 3, L3ID 0 => COS0
    Core 46, L2ID 4, L3ID 0 => COS0
    Core 48, L2ID 8, L3ID 0 => COS0
    Core 50, L2ID 15, L3ID 0 => COS0
    Core 52, L2ID 9, L3ID 0 => COS0
    Core 54, L2ID 14, L3ID 0 => COS0
    Core 56, L2ID 10, L3ID 0 => COS0
    Core 58, L2ID 13, L3ID 0 => COS0
    Core 60, L2ID 11, L3ID 0 => COS0
    Core 62, L2ID 12, L3ID 0 => COS0
Core information for socket 1:
    Core 1, L2ID 16, L3ID 1 => COS1
    Core 3, L2ID 23, L3ID 1 => COS0
    Core 5, L2ID 17, L3ID 1 => COS0
    Core 7, L2ID 22, L3ID 1 => COS0
    Core 9, L2ID 18, L3ID 1 => COS0
    Core 11, L2ID 21, L3ID 1 => COS0
    Core 13, L2ID 19, L3ID 1 => COS0
    Core 15, L2ID 20, L3ID 1 => COS0
    Core 17, L2ID 24, L3ID 1 => COS0
    Core 19, L2ID 31, L3ID 1 => COS0
    Core 21, L2ID 25, L3ID 1 => COS0
    Core 23, L2ID 30, L3ID 1 => COS0
    Core 25, L2ID 26, L3ID 1 => COS0
    Core 27, L2ID 29, L3ID 1 => COS0
    Core 29, L2ID 27, L3ID 1 => COS0
    Core 31, L2ID 28, L3ID 1 => COS0
    Core 33, L2ID 16, L3ID 1 => COS2
    Core 35, L2ID 23, L3ID 1 => COS0
    Core 37, L2ID 17, L3ID 1 => COS0
    Core 39, L2ID 22, L3ID 1 => COS0
    Core 41, L2ID 18, L3ID 1 => COS0
    Core 43, L2ID 21, L3ID 1 => COS0
    Core 45, L2ID 19, L3ID 1 => COS0
    Core 47, L2ID 20, L3ID 1 => COS0
    Core 49, L2ID 24, L3ID 1 => COS0
    Core 51, L2ID 31, L3ID 1 => COS0
    Core 53, L2ID 25, L3ID 1 => COS0
    Core 55, L2ID 30, L3ID 1 => COS0
    Core 57, L2ID 26, L3ID 1 => COS0
    Core 59, L2ID 29, L3ID 1 => COS0
    Core 61, L2ID 27, L3ID 1 => COS0
    Core 63, L2ID 28, L3ID 1 => COS0
PID association information:
    COS1 => (none)
    COS2 => (none)
    COS3 => (none)
    COS4 => (none)
    COS5 => (none)
    COS6 => (none)
    COS7 => (none)
mdcornu commented 2 years ago

Hi,

From chapter 17.19.7.3 "Memory Bandwidth Allocation Usage Considerations" of the Intel Software Developers Manual (volume 3):

As control is provided per processor core (the max of the delay values of the per-thread CLOS applied to the core) care should be taking in scheduling threads so as to not inadvertently place a high-priority thread (with zero intended MBA throttling) next to a low-priority thread (with MBA throttling intended), which would lead to inadvertent throttling of the high-priority thread.

As stated, both sibling threads will be throttled so this case should be avoided if possible.

Regards, Marcel

Msiavashi commented 2 years ago

Hi,

From chapter 17.19.7.3 "Memory Bandwidth Allocation Usage Considerations" of the Intel Software Developers Manual (volume 3):

As control is provided per processor core (the max of the delay values of the per-thread CLOS applied to the core) care should be taking in scheduling threads so as to not inadvertently place a high-priority thread (with zero intended MBA throttling) next to a low-priority thread (with MBA throttling intended), which would lead to inadvertent throttling of the high-priority thread.

As stated, both sibling threads will be throttled so this case should be avoided if possible.

Regards, Marcel

Thanks, Marcel.

I was missing that indeed.

So the solution is either not to schedule a high-priority and a low-priority task on sibling hyper threads or disabling the hyper-threading entirely, am I correct?

Is there such a limitation with CAT too?

mdcornu commented 2 years ago

So the solution is either not to schedule a high-priority and a low-priority task on sibling hyper threads or disabling the hyper-threading entirely, am I correct?

Yes, that is correct.

Is there such a limitation with CAT too?

No, CAT works on a per-thread basis so can be used with sibling threads on the same core.