yuanming-hu / taichi_mpm

High-performance moving least squares material point method (MLS-MPM) solver. (ACM Transactions on Graphics, SIGGRAPH 2018)
MIT License
2.38k stars 315 forks source link

mls-mpm88 compliation time #1

Closed yuanming-hu closed 6 years ago

yuanming-hu commented 6 years ago

baseline: 7.36

yuanming-hu commented 6 years ago

Removed some useless headers: 7.16s empty mls-mpm88.cpp: 6.74

time g++ mls-mpm88.cpp -std=c++14 -g -lX11 -lpthread -O2 -o mls-mpm -ftime-report

Execution times (seconds)
 phase setup             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1324 kB ( 0%) ggc
 phase parsing           :   0.91 (13%) usr   0.83 (43%) sys   1.74 (19%) wall  268739 kB (37%) ggc
 phase lang. deferred    :   0.55 ( 8%) usr   0.20 (10%) sys   0.75 ( 8%) wall  118273 kB (16%) ggc
 phase opt and generate  :   5.42 (77%) usr   0.91 (47%) sys   6.32 (70%) wall  324082 kB (45%) ggc
 phase last asm          :   0.17 ( 2%) usr   0.01 ( 1%) sys   0.19 ( 2%) wall   14501 kB ( 2%) ggc
 |name lookup            :   0.24 ( 3%) usr   0.17 ( 9%) sys   0.45 ( 5%) wall   30053 kB ( 4%) ggc
 |overload resolution    :   0.39 ( 6%) usr   0.19 (10%) sys   0.61 ( 7%) wall  106453 kB (15%) ggc
 garbage collection      :   0.25 ( 4%) usr   0.00 ( 0%) sys   0.26 ( 3%) wall       0 kB ( 0%) ggc
 dump files              :   0.14 ( 2%) usr   0.02 ( 1%) sys   0.21 ( 2%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.06 ( 1%) usr   0.03 ( 2%) sys   0.15 ( 2%) wall    7980 kB ( 1%) ggc
 callgraph optimization  :   0.11 ( 2%) usr   0.04 ( 2%) sys   0.15 ( 2%) wall    4959 kB ( 1%) ggc
 ipa dead code removal   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 ipa cp                  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1759 kB ( 0%) ggc
 ipa inlining heuristics :   0.06 ( 1%) usr   0.00 ( 0%) sys   0.05 ( 1%) wall    3276 kB ( 0%) ggc
 ipa function splitting  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     980 kB ( 0%) ggc
 ipa profile             :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall      93 kB ( 0%) ggc
 ipa icf                 :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       3 kB ( 0%) ggc
 ipa SRA                 :   0.06 ( 1%) usr   0.02 ( 1%) sys   0.02 ( 0%) wall    8397 kB ( 1%) ggc
 cfg construction        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     769 kB ( 0%) ggc
 cfg cleanup             :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.08 ( 1%) wall    1221 kB ( 0%) ggc
 trivially dead code     :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 1%) wall       1 kB ( 0%) ggc
 df scan insns           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      48 kB ( 0%) ggc
 df multiple defs        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.06 ( 1%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :   0.20 ( 3%) usr   0.02 ( 1%) sys   0.24 ( 3%) wall      56 kB ( 0%) ggc
 df live&initialized regs:   0.12 ( 2%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.09 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall    2078 kB ( 0%) ggc
 register information    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.07 ( 1%) usr   0.02 ( 1%) sys   0.05 ( 1%) wall    5448 kB ( 1%) ggc
 alias stmt walking      :   0.07 ( 1%) usr   0.03 ( 2%) sys   0.08 ( 1%) wall     354 kB ( 0%) ggc
 register scan           :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall      34 kB ( 0%) ggc
 rebuild jump labels     :   0.04 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.12 ( 2%) usr   0.19 (10%) sys   0.27 ( 3%) wall    6206 kB ( 1%) ggc
 parser (global)         :   0.19 ( 3%) usr   0.23 (12%) sys   0.45 ( 5%) wall   96278 kB (13%) ggc
 parser struct body      :   0.11 ( 2%) usr   0.04 ( 2%) sys   0.12 ( 1%) wall   18701 kB ( 3%) ggc
 parser function body    :   0.06 ( 1%) usr   0.06 ( 3%) sys   0.19 ( 2%) wall   10185 kB ( 1%) ggc
 parser inl. func. body  :   0.15 ( 2%) usr   0.11 ( 6%) sys   0.18 ( 2%) wall   15275 kB ( 2%) ggc
 parser inl. meth. body  :   0.07 ( 1%) usr   0.03 ( 2%) sys   0.11 ( 1%) wall   16009 kB ( 2%) ggc
 template instantiation  :   0.62 ( 9%) usr   0.32 (16%) sys   0.99 (11%) wall  186637 kB (26%) ggc
 early inlining heuristics:   0.03 ( 0%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall    2381 kB ( 0%) ggc
 inline parameters       :   0.08 ( 1%) usr   0.01 ( 1%) sys   0.04 ( 0%) wall    3096 kB ( 0%) ggc
 integration             :   0.11 ( 2%) usr   0.09 ( 5%) sys   0.29 ( 3%) wall   49127 kB ( 7%) ggc
 tree gimplify           :   0.05 ( 1%) usr   0.03 ( 2%) sys   0.04 ( 0%) wall   12371 kB ( 2%) ggc
 tree eh                 :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    2239 kB ( 0%) ggc
 tree CFG construction   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    6880 kB ( 1%) ggc
 tree CFG cleanup        :   0.06 ( 1%) usr   0.02 ( 1%) sys   0.04 ( 0%) wall     328 kB ( 0%) ggc
 tree tail merge         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       5 kB ( 0%) ggc
 tree VRP                :   0.15 ( 2%) usr   0.02 ( 1%) sys   0.11 ( 1%) wall    7178 kB ( 1%) ggc
 tree copy propagation   :   0.00 ( 0%) usr   0.02 ( 1%) sys   0.04 ( 0%) wall      81 kB ( 0%) ggc
 tree PTA                :   0.06 ( 1%) usr   0.04 ( 2%) sys   0.11 ( 1%) wall    1462 kB ( 0%) ggc
 tree PHI insertion      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    1453 kB ( 0%) ggc
 tree SSA rewrite        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    6448 kB ( 1%) ggc
 tree SSA other          :   0.00 ( 0%) usr   0.02 ( 1%) sys   0.03 ( 0%) wall     483 kB ( 0%) ggc
 tree SSA incremental    :   0.05 ( 1%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall    1744 kB ( 0%) ggc
 tree operand scan       :   0.12 ( 2%) usr   0.04 ( 2%) sys   0.22 ( 2%) wall   16451 kB ( 2%) ggc
 dominator optimization  :   0.09 ( 1%) usr   0.03 ( 2%) sys   0.06 ( 1%) wall    3446 kB ( 0%) ggc
 tree SRA                :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     789 kB ( 0%) ggc
 tree CCP                :   0.04 ( 1%) usr   0.01 ( 1%) sys   0.06 ( 1%) wall     897 kB ( 0%) ggc
 tree split crit edges   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall    2478 kB ( 0%) ggc
 tree reassociation      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall      69 kB ( 0%) ggc
 tree PRE                :   0.13 ( 2%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall    3858 kB ( 1%) ggc
 tree FRE                :   0.10 ( 1%) usr   0.01 ( 1%) sys   0.09 ( 1%) wall    2818 kB ( 0%) ggc
 tree linearize phis     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     572 kB ( 0%) ggc
 tree forward propagate  :   0.02 ( 0%) usr   0.03 ( 2%) sys   0.05 ( 1%) wall    1186 kB ( 0%) ggc
 tree conservative DCE   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      81 kB ( 0%) ggc
 tree aggressive DCE     :   0.04 ( 1%) usr   0.02 ( 1%) sys   0.05 ( 1%) wall    5364 kB ( 1%) ggc
 tree DSE                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      41 kB ( 0%) ggc
 PHI merge               :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     113 kB ( 0%) ggc
 tree loop bounds        :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     482 kB ( 0%) ggc
 tree loop invariant motion:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      39 kB ( 0%) ggc
 tree canonical iv       :   0.02 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall     651 kB ( 0%) ggc
 complete unrolling      :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 1%) wall    2291 kB ( 0%) ggc
 tree iv optimization    :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 1%) wall    5189 kB ( 1%) ggc
 tree copy headers       :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall     547 kB ( 0%) ggc
 tree SSA uncprop        :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 dominance frontiers     :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.11 ( 2%) usr   0.04 ( 2%) sys   0.14 ( 2%) wall       0 kB ( 0%) ggc
 control dependences     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      77 kB ( 0%) ggc
 expand vars             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1553 kB ( 0%) ggc
 expand                  :   0.11 ( 2%) usr   0.01 ( 1%) sys   0.03 ( 0%) wall   25422 kB ( 3%) ggc
 post expand cleanups    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1376 kB ( 0%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       6 kB ( 0%) ggc
 forward prop            :   0.03 ( 0%) usr   0.01 ( 1%) sys   0.06 ( 1%) wall    1601 kB ( 0%) ggc
 CSE                     :   0.07 ( 1%) usr   0.02 ( 1%) sys   0.09 ( 1%) wall     445 kB ( 0%) ggc
 dead code elimination   :   0.01 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   0.06 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall    1281 kB ( 0%) ggc
 dead store elim2        :   0.04 ( 1%) usr   0.01 ( 1%) sys   0.05 ( 1%) wall    1827 kB ( 0%) ggc
 loop init               :   0.04 ( 1%) usr   0.01 ( 1%) sys   0.12 ( 1%) wall    6930 kB ( 1%) ggc
 loop invariant motion   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      93 kB ( 0%) ggc
 loop fini               :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 CPROP                   :   0.10 ( 1%) usr   0.00 ( 0%) sys   0.08 ( 1%) wall    2915 kB ( 0%) ggc
 PRE                     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     441 kB ( 0%) ggc
 CSE 2                   :   0.03 ( 0%) usr   0.02 ( 1%) sys   0.07 ( 1%) wall     261 kB ( 0%) ggc
 branch prediction       :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.01 ( 0%) wall    1457 kB ( 0%) ggc
 combiner                :   0.15 ( 2%) usr   0.01 ( 1%) sys   0.19 ( 2%) wall    7176 kB ( 1%) ggc
 integrated RA           :   0.24 ( 3%) usr   0.00 ( 0%) sys   0.28 ( 3%) wall   23088 kB ( 3%) ggc
 LRA non-specific        :   0.09 ( 1%) usr   0.00 ( 0%) sys   0.12 ( 1%) wall    1442 kB ( 0%) ggc
 LRA virtuals elimination:   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1266 kB ( 0%) ggc
 LRA reload inheritance  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     767 kB ( 0%) ggc
 LRA create live ranges  :   0.08 ( 1%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     216 kB ( 0%) ggc
 LRA hard reg assignment :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 LRA rematerialization   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 reload                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 reload CSE regs         :   0.13 ( 2%) usr   0.00 ( 0%) sys   0.07 ( 1%) wall    2386 kB ( 0%) ggc
 ree                     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      81 kB ( 0%) ggc
 thread pro- & epilogue  :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1088 kB ( 0%) ggc
 if-conversion 2         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall      10 kB ( 0%) ggc
 peephole 2              :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     235 kB ( 0%) ggc
 hard reg cprop          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      37 kB ( 0%) ggc
 scheduling 2            :   0.12 ( 2%) usr   0.00 ( 0%) sys   0.13 ( 1%) wall     827 kB ( 0%) ggc
 reorder blocks          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall    1887 kB ( 0%) ggc
 shorten branches        :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 final                   :   0.07 ( 1%) usr   0.00 ( 0%) sys   0.10 ( 1%) wall   10570 kB ( 1%) ggc
 symout                  :   0.21 ( 3%) usr   0.05 ( 3%) sys   0.28 ( 3%) wall   75285 kB (10%) ggc
 variable tracking       :   0.14 ( 2%) usr   0.00 ( 0%) sys   0.08 ( 1%) wall    7528 kB ( 1%) ggc
 var-tracking dataflow   :   0.18 ( 3%) usr   0.00 ( 0%) sys   0.21 ( 2%) wall     360 kB ( 0%) ggc
 var-tracking emit       :   0.16 ( 2%) usr   0.00 ( 0%) sys   0.18 ( 2%) wall   10155 kB ( 1%) ggc
 straight-line strength reduction:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      77 kB ( 0%) ggc
 early local passes      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 rest of compilation     :   0.17 ( 2%) usr   0.02 ( 1%) sys   0.18 ( 2%) wall    2052 kB ( 0%) ggc
 remove unused locals    :   0.03 ( 0%) usr   0.03 ( 2%) sys   0.07 ( 1%) wall      20 kB ( 0%) ggc
 address taken           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   0.17 ( 2%) usr   0.08 ( 4%) sys   0.18 ( 2%) wall       0 kB ( 0%) ggc
 rebuild frequencies     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall      20 kB ( 0%) ggc
 TOTAL                 :   7.06             1.95             9.02             727062 kB
g++ mls-mpm88.cpp -std=c++14 -g -lX11 -lpthread -O2 -o mls-mpm -ftime-report  7.45s user 2.00s system 99% cpu 9.447 total
yuanming-hu commented 6 years ago

Removed stb* in amal.: 4.06s