llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.44k stars 12.17k forks source link

Large increase in RISC-V tblgen build times #49312

Open RKSimon opened 3 years ago

RKSimon commented 3 years ago
Bugzilla Link 49968
Version trunk
OS Windows NT
Blocks llvm/llvm-project#28596
CC @asb,@topperc,@frasercrmck

Extended Description

The RISC-V tblgen builds have been getting slower and slower, and now appears to be the slowest of the big three (AMDGPU, X86 and RISC-V), especially if I end up having to use debug builds.....

I realise slower builds is almost inevitable as RISC-V develops and adds features, but it'd be great if there was some low hanging cleanup that could be done to get this under control - whether its improvements to tblgen itself or the RISC-V td files get more efficient.

llvmbot commented 3 years ago

Okay, we can certainly keep it open.

topperc commented 3 years ago

A few ideas

-Call InferInstructionFlags and VerifyInstructionFlags before GenerateVariants and ExpandHwModeBasedTypes and don't call GenerateVariants or ExpandHwModeBasedTypes for GenInstrInfo.inc. This assumes that GenInstrInfo.inc doesn't care about the expanded patterns and only cares about the mayLoad/mayStore/etc flags and that those flags don't depend on the expansion.

-Replace the std::map in InfoByHwMode by a "small" data structure of some sort. It might be possible to just maintain a sorted SmallVector. The number of modes is small so we should be able to quickly do ordered inserts. This may not give us much, but we currently are guaranteed a heap allocation every time this or its subclasses are used. I think I have a prototype of this.

RKSimon commented 3 years ago

I spent a couple of days fooling around with the TableGen lexer trying to speed it up. No joy there. I could knock off 1–2% of the time, which was probably just noise.

I think we can close this ticket.

I don't agree - until we've identified the next things to work on to reduce tblgen runtimes, we shouldn't close this.

llvmbot commented 3 years ago

I spent a couple of days fooling around with the TableGen lexer trying to speed it up. No joy there. I could knock off 1–2% of the time, which was probably just noise.

I think we can close this ticket.

frasercrmck commented 3 years ago

If it helps, I just looked into the fairly nebulous "Analyze DAG patterns" and found that the call to CodeGenDAGPatterns::GenerateVariants takes about 50% of the -gen-instr-info time. I don't know how easy that would be to optimize.

RKSimon commented 3 years ago

Has anyone any ideas on what to do next on this?

This is a very general ticket, so if we have more specific improvements in mind, I'm happy for this to be resolved and those improvements to be handled in their own tickets.

RKSimon commented 3 years ago

Thanks - I can confirm my build times have definitely improved!

llvmbot commented 3 years ago

Hang on, are those two sets of timings comparing apples with apples? If so, nice!

llvmbot commented 3 years ago

I'm going to spend some time in the next few weeks looking at TableGen parsing to see if there is anything that can be done relatively easily to speed it up. I'm not optimistic, but it's worth a good look.

frasercrmck commented 3 years ago

Certainly. As before, just using -time-phases in a release build:

[12/17] Building RISCVGenGlobalISel.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 5.4829 seconds (5.7914 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 4.5548 ( 86.4%) 0.1697 ( 81.0%) 4.7245 ( 86.2%) 4.8007 ( 82.9%) Backend overall 0.7185 ( 13.6%) 0.0398 ( 19.0%) 0.7583 ( 13.8%) 0.9906 ( 17.1%) Parse, build records 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Write output 5.2734 (100.0%) 0.2095 (100.0%) 5.4829 (100.0%) 5.7914 (100.0%) Total

[11/46] Building RISCVGenInstrInfo.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 5.3306 seconds (5.7432 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 4.0939 ( 79.8%) 0.1364 ( 68.3%) 4.2302 ( 79.4%) 4.3308 ( 75.4%) Analyze DAG patterns 0.6909 ( 13.5%) 0.0633 ( 31.7%) 0.7542 ( 14.1%) 1.0660 ( 18.6%) Parse, build records 0.1299 ( 2.5%) 0.0000 ( 0.0%) 0.1299 ( 2.4%) 0.1300 ( 2.3%) Emit helper methods 0.1100 ( 2.1%) 0.0000 ( 0.0%) 0.1100 ( 2.1%) 0.1101 ( 1.9%) Emit map table 0.0437 ( 0.9%) 0.0000 ( 0.0%) 0.0437 ( 0.8%) 0.0437 ( 0.8%) Emit InstrDesc records 0.0258 ( 0.5%) 0.0000 ( 0.0%) 0.0258 ( 0.5%) 0.0258 ( 0.4%) Emit operand info 0.0117 ( 0.2%) 0.0000 ( 0.0%) 0.0117 ( 0.2%) 0.0117 ( 0.2%) Emit operand type mappings 0.0104 ( 0.2%) 0.0000 ( 0.0%) 0.0104 ( 0.2%) 0.0104 ( 0.2%) Emit instruction names 0.0035 ( 0.1%) 0.0000 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Emit logical operand size mappings 0.0035 ( 0.1%) 0.0000 ( 0.0%) 0.0035 ( 0.1%) 0.0035 ( 0.1%) Emit uses/defs 0.0034 ( 0.1%) 0.0000 ( 0.0%) 0.0034 ( 0.1%) 0.0034 ( 0.1%) Emit logical operand type mappings 0.0033 ( 0.1%) 0.0000 ( 0.0%) 0.0033 ( 0.1%) 0.0033 ( 0.1%) Emit operand name mappings 0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Write output 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Emit initialization routine 5.1310 (100.0%) 0.1997 (100.0%) 5.3306 (100.0%) 5.7432 (100.0%) Total

[13/16] Building RISCVGenDAGISel.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 6.6774 seconds (7.0148 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 3.6783 ( 56.4%) 0.1099 ( 71.8%) 3.7882 ( 56.7%) 3.8855 ( 55.4%) Parse patterns 1.2059 ( 18.5%) 0.0000 ( 0.0%) 1.2059 ( 18.1%) 1.2068 ( 17.2%) Convert to matchers 0.7158 ( 11.0%) 0.0365 ( 23.9%) 0.7524 ( 11.3%) 0.9905 ( 14.1%) Parse, build records 0.4000 ( 6.1%) 0.0000 ( 0.0%) 0.4000 ( 6.0%) 0.4006 ( 5.7%) Sort patterns 0.2748 ( 4.2%) 0.0000 ( 0.0%) 0.2748 ( 4.1%) 0.2752 ( 3.9%) Optimize matchers 0.2464 ( 3.8%) 0.0066 ( 4.3%) 0.2530 ( 3.8%) 0.2532 ( 3.6%) Emit matcher table 0.0031 ( 0.0%) 0.0000 ( 0.0%) 0.0031 ( 0.0%) 0.0031 ( 0.0%) Write output 6.5244 (100.0%) 0.1530 (100.0%) 6.6774 (100.0%) 7.0148 (100.0%) Total

One thing that may be worth mentioning now is that the various MC emitters (RISCVGenAsmWriter.inc, RISCVGenMCCodeEmitter.inc, RISCVGenDisassemblerTables.inc, RISCVGenAsmMatcher.inc) take 1.2 - 1.4 seconds each. Now that the other emitters have been reduced well, these are taking up a larger percentage of the time. It's 70-80% "Parse, build records" time for those.

RKSimon commented 3 years ago

Simon, are you able to tell how much things have improved? My debug build on my Linux server is currently about 29 seconds to generate RISCVGenInstrInfo.inc, but I think MSVC time might be higher?

Yeah, MSVC tends to be slower for debug builds.

I'm away from my main dev machine for the holiday weekend so haven't had chance to compare full builds yet.

Fraser - any chance you can repeat those timing reports against trunk latest please?

topperc commented 3 years ago

Simon, are you able to tell how much things have improved? My debug build on my Linux server is currently about 29 seconds to generate RISCVGenInstrInfo.inc, but I think MSVC time might be higher?

Summary of changes made so far -Expand variants before expanding HwMode. Since expand variants is quadratic in the number of patterns, it is better to do it before we double the patterns by expanding hw mode. -Patterns are stored in PatternToMatch now using ListInit * making them cheap to compare in GenerateVariants where we used to have a std::vector compare. -Removed BitVector based predicate filtering in GenerateVariants. This removed a bunch of heap overhead for storing a nxn matrix of bits where n is the number of patterns.

Minor things: -Replaced std::set or DenseSet with SmallSet or MachineValueTypeSet which should cut down on some heap allocation or hashing.

topperc commented 3 years ago

Patches to try to improve things https://reviews.llvm.org/D100691 https://reviews.llvm.org/D100695

topperc commented 3 years ago

I'm going to see if I can switch the predicates in PatternToMatch back to a simple to compare type like ListInit* which is what it was prior to the introduction of HwMode in 2017. That might allow us to remove the large std::vector of BitVector from GenerateVariants.

topperc commented 3 years ago

Or they are less helpful on RISCV than they were on X86.

topperc commented 3 years ago

There were two patches in 2018 that were intended to improve the performance of GenerateVariants https://reviews.llvm.org/D51035 and https://reviews.llvm.org/D51839 on debug builds on MSVC. They might have slowed down Release builds or non-MSVC builds. Need to do some more investigation.

llvmbot commented 3 years ago

GenerateVariants() is a phase of building a CodeGenDAGPatterns instance. Instances a built by DAGIselEmitter, FastIselEmitter, GlobalIselEmitter. InstrDocsEmitter, InstrInfoEmitter.

GenerateVariants() has one double-nested loop and one triple-nested loop.The first loop calls GenerateVariantsOf(), which is pretty hairy.

I don't know enough about variants to get a feeling for how this might be improved. I can spend some time studying it. Do you have a sense of where it could be improved?

topperc commented 3 years ago

I think a lot of time is spent in GenerateVariants. Do we need to do that for InstrInfo?

llvmbot commented 3 years ago

Yep, those are usually the three culprits. I should add more instrumentation to GlobalISel and InstrInfo so we can see which backend phases take the time. Then we can look at the longest phases and see if there is anything we can do.

In DagISel, the pattern parsing might be able to be improved.

frasercrmck commented 3 years ago

Adding in some data I've generated from the RISC-V backend and intrinsics. The other tablegen targets take less than two second a piece.

[8/42] Building RISCVGenGlobalISel.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 15.9415 seconds (17.0387 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 14.8536 ( 95.4%) 0.3255 ( 86.4%) 15.1791 ( 95.2%) 15.4707 ( 90.8%) Backend overall 0.7109 ( 4.6%) 0.0510 ( 13.5%) 0.7619 ( 4.8%) 1.5641 ( 9.2%) Parse, build records 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0040 ( 0.0%) Write output 15.5649 (100.0%) 0.3766 (100.0%) 15.9415 (100.0%) 17.0387 (100.0%) Total

[9/42] Building RISCVGenInstrInfo.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 16.0652 seconds (17.1984 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 14.4727 ( 91.9%) 0.2624 ( 81.2%) 14.7351 ( 91.7%) 14.9221 ( 86.8%) Analyze DAG patterns 0.7123 ( 4.5%) 0.0441 ( 13.7%) 0.7564 ( 4.7%) 1.7012 ( 9.9%) Parse, build records 0.1840 ( 1.2%) 0.0166 ( 5.1%) 0.2006 ( 1.2%) 0.2011 ( 1.2%) Emit helper methods 15.7420 (100.0%) 0.3232 (100.0%) 16.0652 (100.0%) 17.1984 (100.0%) Total

[2/5] Building RISCVGenDAGISel.inc... ===-------------------------------------------------------------------------=== TableGen Phase Timing ===-------------------------------------------------------------------------=== Total Execution Time: 18.6654 seconds (18.8064 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 13.9062 ( 76.2%) 0.3322 ( 81.5%) 14.2384 ( 76.3%) 14.2747 ( 75.9%) Parse patterns 2.2140 ( 12.1%) 0.0000 ( 0.0%) 2.2141 ( 11.9%) 2.2195 ( 11.8%) Convert to matchers 0.6775 ( 3.7%) 0.0520 ( 12.8%) 0.7295 ( 3.9%) 0.7349 ( 3.9%) Parse, build records 0.5580 ( 3.1%) 0.0000 ( 0.0%) 0.5580 ( 3.0%) 0.5593 ( 3.0%) Sort patterns 0.5381 ( 2.9%) 0.0000 ( 0.0%) 0.5381 ( 2.9%) 0.5394 ( 2.9%) Optimize matchers 0.3553 ( 1.9%) 0.0199 ( 4.9%) 0.3753 ( 2.0%) 0.3762 ( 2.0%) Emit matcher table 0.0085 ( 0.0%) 0.0034 ( 0.8%) 0.0120 ( 0.1%) 0.1024 ( 0.5%) Write output 18.2577 (100.0%) 0.4076 (100.0%) 18.6654 (100.0%) 18.8064 (100.0%) Total

No surprises there, I suppose.

llvmbot commented 3 years ago

Do you have some idea which backends are chewing up the most time?

llvmbot commented 3 years ago

No, I haven't done any profiling and I have no performance enhancements on my to-do list. Don't forget the llvm-tblgen -time-phases option, which can give you a quick idea where the time is being spent.

asb commented 3 years ago

Thanks for the report Simon. Paul: I'm adding you to the CC on the off-chance you've done any profiling recently of tblgen or had any planned performance improvements in the works (just to avoid duplicated effort).