llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.3k stars 12.11k forks source link

[MCA] Multiple Int schedulers #49694

Open LebedevRI opened 3 years ago

LebedevRI commented 3 years ago
Bugzilla Link 50350
Version trunk
OS Linux
Blocks llvm/llvm-project#31672
CC @adibiagio,@RKSimon,@MattPD

Extended Description

Currently, Znver3 models as-if the Int unit has a single scheduler encompassing all the 9 pipes. This isn't quite right.

I would like to model it as:

diff --git a/llvm/lib/Target/X86/X86ScheduleZnver3.td b/llvm/lib/Target/X86/X86ScheduleZnver3.td
index 84aee73bad63..a7d369f55b74 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver3.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver3.td
@@ -163,15 +163,23 @@ def Zn3IntegerPRF : RegisterFile<192, [GR64, CCR], [1, 1], [1, 0],
 // The schedulers can receive up to six macro ops per cycle, with a limit of
 // two per scheduler. Each scheduler can issue one micro op per cycle into
 // each of its associated pipelines
-// FIXME: these are 4 separate schedulers, not a single big one.
-def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0, // scheduler 0
-                           Zn3ALU1, Zn3AGU1,          // scheduler 1
-                           Zn3ALU2, Zn3AGU2,          // scheduler 2
-                           Zn3ALU3,          Zn3BRU1  // scheduler 3
-                          ]> {
-  let BufferSize = !mul(4, 24);
+def Zn3IntSch0 : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0]> { // scheduler 0
+  let BufferSize = 24;
+}
+def Zn3IntSch1 : ProcResGroup<[Zn3ALU1, Zn3AGU1         ]> { // scheduler 1
+  let BufferSize = 24;
+}
+def Zn3IntSch2 : ProcResGroup<[Zn3ALU2, Zn3AGU2         ]> { // scheduler 2
+  let BufferSize = 24;
+}
+def Zn3IntSch3 : ProcResGroup<[Zn3ALU3,          Zn3BRU1]> { // scheduler 3
+  let BufferSize = 24;
 }

+def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0,   // scheduler 0
+                           Zn3ALU1, Zn3AGU1,            // scheduler 1
+                           Zn3ALU2, Zn3AGU2,            // scheduler 2
+                           Zn3ALU3,          Zn3BRU1]>; // scheduler 3

 //===----------------------------------------------------------------------===//
 // Floating-Point Unit

The last Zn3Int is needed to silence the

FAILED: lib/Target/X86/X86GenSubtargetInfo.inc 
cd /builddirs/llvm-project/build-Clang12 && /builddirs/llvm-project/build-Clang12/bin/llvm-tblgen -gen-subtarget -I /repositories/llvm-project/llvm/lib/Target/X86 -I/builddirs/llvm-project/build-Clang12/include -I/repositories/llvm-project/llvm/include -I /repositories/llvm-project/llvm/lib/Target /repositories/llvm-project/llvm/lib/Target/X86/X86.td --write-if-changed -o lib/Target/X86/X86GenSubtargetInfo.inc -d lib/Target/X86/X86GenSubtargetInfo.inc.d
Included from /repositories/llvm-project/llvm/lib/Target/X86/X86.td:562:
/repositories/llvm-project/llvm/lib/Target/X86/X86ScheduleZnver3.td:136:1: error: proc resource group overlaps with Zn3IntSch0 but no supergroup contains both.
def Zn3AGU012 : ProcResGroup<[Zn3AGU0, Zn3AGU1, Zn3AGU2]>;
^

But as a result, the Zn3IntSch0/Zn3IntSch1/Zn3IntSch2/Zn3IntSch3 appear to not be used at all.

How would i approach this? The same problem exists for FP unit.

adibiagio commented 3 years ago

This is unfortunately a known design limitation.

The good news is that it is possible to fix it. The bad news is that it won't be a simple fix, and it would definitely require a non trivial redesign of the dispatch stage.

To fix this issue, we need to introduce the concept of "hierarchy of schedulers resources". That hierarchy could be designed like a dag / a dominator-tree like structure where nodes are resource groups, and each node has an immediate dominator (i.e. an immediate super-group).

Scheduler resources will have to be fully pre-allocated at dispatch stage. Technically speaking, this is fine, and it potentially allows for a more accurate simulation of the dispatch logic in hardware. When a scheduler resource S is selected, sub-groups (if any) are also selected by doing a traversal of the dag. In the presence of multiple sub-groups, a round-robin selector is used to pick the underlying sub-group instance. The hierarchy of group resources is fixed, and can be pre-computed by the dispatch stage at construction time.

In conclusion, it is technically possible to fix this issue. However, it is a big feature, and it requires a significant redesign of the dispatch logic.