Open nico opened 6 years ago
rL327974 merges all the InstRW that belong to the same SchedWriteRes in the SandyBridge model. It also merges some of the VEX/non-VEX instructions into the same instregex, but there are a lot more that could be done.
Hopefully we can get this done for the other models as well.
Anything we can do to reduce the number InstRWs should tend to reduce code size if the set of instructions is grouped on every CPU and use the same SchedRW/itinerary.
We could also merge AVX regular expressions into their SSE counterparts by adding "(V)?" to the beginning of the regular expressions.
That does make sense if we're going to keep the regex, I'm still a little unclear of when the cost of matching a regex to multiple instruction is better than having all the instructions listed in instrs() block, although sometimes regexs catches missed instructions (e.g. the *_Int versions for instance get matched for free currently).
I'm going to start merging some of the instregex and see where that gets us.
Should we be deliberate with the merging by focusing on combining lines with similar instructions rather than just anything that happens to have the same latency/throughput/ports on one CPU. Logical groups like that might guide us to better SchedRWs?
We get a related metric from CodeGenSchedModels::createInstRWClass, when it splits all the entries from a InstRW into entries in its ClassInstrs vector, which maps them back to their original scheduler classes. If we kept track of how those 'original' classes get redistributed by InstRW entries then we might be able to flag classes that need splitting/refactoring. I've already been playing with this to help recognise unnecessary InstRW entries.
Anything we can do to reduce the number InstRWs should tend to reduce code size if the set of instructions is grouped on every CPU and use the same SchedRW/itinerary.
We could also merge AVX regular expressions into their SSE counterparts by adding "(V)?" to the beginning of the regular expressions.
Should we be deliberate with the merging by focusing on combining lines with similar instructions rather than just anything that happens to have the same latency/throughput/ports on one CPU. Logical groups like that might guide us to better SchedRWs?
Possible things to look at next:
1 - throw an warning/error if a regex only has a single match so could be replaced with an instr
2 - merge multiple InstRW lines to the same WriteRes to reduce time in CodeGenSchedModels::collectSchedClasses() (this might reduce code size as well?):
from: def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> { let Latency = 52; let NumMicroOps = 4; let ResourceCycles = [2,1,1]; } def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm")>; def: InstRW<[SBWriteResGroup134], (instregex "VSQRTPDYm")>;
to: def SBWriteResGroup134 : SchedWriteRes<[SBPort0,SBPort23,SBPort05]> { let Latency = 52; let NumMicroOps = 4; let ResourceCycles = [2,1,1]; } def: InstRW<[SBWriteResGroup134], (instregex "VDIVPDYrm", "VSQRTPDYm")>;
Craig - any thoughts? Is there any reason not to start (2) right away?
After r323277 this should be a lot faster. Getting rid of instregex would be nice though.
Updating the Btver2 model will be pretty trivial - I'm not sure whether there are any instregex worth keeping or not - most match 1 or 2 instructions only and should definitely go. We have some that match more but not many, and those should benefit from merging the similar InstRW<> defs.
Anyone want to guess at what point the binary regexp search actually helps?
Craig/Gadi - IIRC most of Intel's scheduler models are auto-generated is that right? Can they be updated to issue instrs and/or instregex ?
We're largely abusing instregex in the scheduler models. Most of them match a single instruction. We should convert them to Instrs or combine the regular expressions to make them useful.
Extended Description
In a highly-parallel build (-j250), full build time is limited by llvm-tblgen taking 1 minute to generate X86GenInstrInfo.inc.tmp and X86GenSubtargetInfo.inc.tmp each.
Command:
cd /Users/thakis/src/llvm-build-goma/lib/Target/X86 && time /Users/thakis/src/llvm-build-goma/bin/llvm-tblgen -gen-instr-info -I /Users/thakis/src/llvm-rw/lib/Target/X86 -I /Users/thakis/src/llvm-rw/include -I /Users/thakis/src/llvm-rw/lib/Target /Users/thakis/src/llvm-rw/lib/Target/X86/X86.td -o /Users/thakis/src/llvm-build-goma/lib/Target/X86/X86GenInstrInfo.inc.tmp
Profile:
43.05 s 100.0% 0 s llvm-tblgen (26142) 43.05 s 100.0% 0 s Main Thread 0x7659e 43.05 s 99.9% 0 s start 43.04 s 99.9% 0 s main 43.04 s 99.9% 0 s llvm::TableGenMain(char, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&)) 40.53 s 94.1% 0 s (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) 40.53 s 94.1% 0 s llvm::EmitInstrInfo(llvm::RecordKeeper&, llvm::raw_ostream&) 38.05 s 88.3% 0 s llvm::CodeGenTarget::getSchedModels() const 38.05 s 88.3% 0 s llvm::CodeGenSchedModels::CodeGenSchedModels(llvm::RecordKeeper&, llvm::CodeGenTarget const&) 37.93 s 88.1% 3.00 ms llvm::CodeGenSchedModels::collectSchedClasses() 37.88 s 87.9% 12.00 ms llvm::CodeGenSchedModels::createInstRWClass(llvm::Record) 37.85 s 87.9% 38.00 ms llvm::SetTheory::expand(llvm::Record) 37.77 s 87.7% 28.67 s (anonymous namespace)::InstRegexOp::apply(llvm::SetTheory&, llvm::DagInit, llvm::SmallSetVector<llvm::Record, 16u>&, llvm::ArrayRef)
9.02 s 20.9% 1.45 s llvm::Regex::match(llvm::StringRef, llvm::SmallVectorImpl*)
41.00 ms 0.0% 0 s llvm::Regex::Regex(llvm::StringRef, unsigned int)
Looking through http://llvm-cs.pcc.me.uk/utils/TableGen/CodeGenSchedule.cpp#60 and its callers on that stack suggests that reordering could help a lot here.
All the instregex calls are pretty new (r315175, r316492, etc); maybe you could try to speed this up some?