taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.34k stars 2.27k forks source link

[opt] Improve optimizer performance #926

Open yuanming-hu opened 4 years ago

yuanming-hu commented 4 years ago

Concisely describe the problem

It is reported by some users that their program takes very long to compile. For example, compiling https://github.com/allangelman/taichi_sdf/commit/76052e7da0e63476c23c8bc7f2f3a53314a2d7a8 can sometimes take half an hour on their computer. This is apparently too long since their program takes < 1 minute to run.

With advanced_optimization=true the paint kernel takes ~13 minutes to compile

[T 05/05/20 20:22:06.698] [/home/yuanming/repos/taichi/python/taichi/lang/kernel.py:__call__@414] Compiling kernel paint_c20_0...
[T 05/05/20 20:34:52.119] [/home/yuanming/repos/taichi/python/taichi/lang/kernel.py:__call__@414] Compiling kernel matrix_to_ext_arr_c32_0...

With advanced_optimization=false the kernel takes ~4 minutes to compile.

[T 05/05/20 20:22:24.456] [/home/yuanming/repos/taichi/python/taichi/lang/kernel.py:__call__@414] Compiling kernel paint_c20_0...
[T 05/05/20 20:26:31.849] [/home/yuanming/repos/taichi/python/taichi/lang/kernel.py:__call__@414] Compiling kernel matrix_to_ext_arr_c32_0...

Describe the solution you'd like (if any) More investigation is needed to figure out the best way to improve compilation time. We probably need to work on both the old optimization passes and the recently introduced advanced optimizations. For now, I'll just introduce a flag for the users in urgent need to disable advanced_optimization.

xumingkuan commented 4 years ago

The numbers after : here are the number of statements.

[T 05/07/20 15:06:47.877] [C:\Users\xmk\Desktop\taichi\python\taichi\lang\kernel.py:__call__@414] Compiling kernel paint_c20_0...
[I 05/07/20 15:06:49.493] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Initial IR:23889
[I 05/07/20 15:07:41.279] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Lowered:50623
[I 05/07/20 15:07:41.453] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Typechecked:52054
[I 05/07/20 15:07:41.488] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] SLP:52054
[I 05/07/20 15:07:41.494] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Loop Vectorized:52054
[I 05/07/20 15:07:41.526] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Loop Split:52054
[I 05/07/20 15:08:45.561] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Simplified I:19704
[I 05/07/20 15:08:45.584] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Dense struct-for demoted:19730
[I 05/07/20 15:08:45.626] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Constant extracted:19730
[T 05/07/20 15:08:45.636] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@624] before: 19730
[T 05/07/20 15:26:23.462] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@629] middle: 5292
[T 05/07/20 15:26:23.467] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@637] after: 5292
[I 05/07/20 15:26:23.468] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Store forwarded:5292
[I 05/07/20 15:26:23.489] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Access lowered:5752
[I 05/07/20 15:26:23.870] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] DIE:3163
[I 05/07/20 15:26:24.115] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Simplified II:2829
[I 05/07/20 15:26:24.119] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Access flagged:2829
[I 05/07/20 15:26:24.123] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Constant folded:2829
[I 05/07/20 15:26:24.247] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Offloaded:3645
[I 05/07/20 15:26:24.274] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Constant extracted II:3645
[I 05/07/20 15:26:24.281] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Atomics demoted:3673
[T 05/07/20 15:26:24.285] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@624] before: 3673
[T 05/07/20 15:26:24.289] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@629] middle: 3670
[T 05/07/20 15:26:24.292] [variable_optimization.cpp:taichi::lang::irpass::variable_optimization@637] after: 3670
[I 05/07/20 15:26:24.293] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Store forwarded II:3670
[I 05/07/20 15:26:24.395] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_e9fb3a542b024982fca1a22c818a9b90>::operator ()@18] Simplified III:2647

Some points:

yuanming-hu commented 4 years ago

Screenshot from 2020-05-08 13-39-33

Update: someone came up with a Taichi program with 130K statements and takes one night to compile...

xumingkuan commented 4 years ago

An example where compilation takes a long time (the number in Initial IR:17653 denotes there are 17653 statements after "Initial IR"):

[I 06/15/20 22:47:43.382] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Initial IR:17653
[I 06/15/20 22:48:05.299] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Lowered:43334
[I 06/15/20 22:48:05.451] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Typechecked:44767
[I 06/15/20 22:48:05.484] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Loop Vectorized:44767
[I 06/15/20 22:48:05.505] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Loop Split:44767
[I 06/15/20 22:48:41.892] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Simplified I:14561
[I 06/15/20 22:48:41.906] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Dense struct-for demoted:14586
[I 06/15/20 22:53:54.376] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Store forwarded:5301
[I 06/15/20 22:53:54.379] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Access flagged I:5301
[T 06/15/20 22:53:54.404] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016991827789070
[T 06/15/20 22:53:54.424] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016991827789065
[T 06/15/20 22:53:54.442] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016989814523156
[T 06/15/20 22:53:54.461] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016991760417025
[T 06/15/20 22:53:54.480] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016991760417043
[T 06/15/20 22:53:54.503] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=10778016991760417029
[I 06/15/20 22:53:55.691] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Simplified II:2236
[I 06/15/20 22:53:55.795] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Offloaded:3139
[I 06/15/20 22:53:55.797] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Access flagged II:3139
[I 06/15/20 22:53:55.805] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Access lowered:3599
[I 06/15/20 22:53:55.810] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] DIE:3484
[I 06/15/20 22:53:55.812] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Access flagged III:3484
[I 06/15/20 22:53:55.819] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Atomics demoted:3512
[I 06/15/20 22:53:55.824] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Store forwarded II:3510
[I 06/15/20 22:53:55.825] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Optimized by CFG:3510
[I 06/15/20 22:53:56.559] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_33335a717662f11dd363d28df7ede0fa>::operator ()@22] Simplified III:2451
......
codegen_accessor_statements: 148.00
codegen_evaluator_statements: 102.00
codegen_kernel_statements: 29170.00
codegen_offloaded_tasks: 53.00
codegen_statements  : 29420.00
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[Profiler thread 18452]
      1.236  s taichi::lang::TaichiLLVMContext::clone_runtime_module [1 x   1.236  s]
          1.215  s 98.24%  taichi::lang::compile_runtime_bitcode [1 x   1.215  s]
          0.015  s  1.25%  taichi::lang::module_from_bitcode_file [1 x  15.402 ms]
          0.006  s  0.50%  clone module          [1 x   6.218 ms]
    306.471 ms taichi::lang::StructCompilerLLVM::run [1 x 306.471 ms]
          0.130 ms  0.04%  taichi::lang::StructCompilerLLVM::generate_types [69 x   1.884 us]
          0.537 ms  0.18%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x 537.000 us]
             44.000 us  8.19%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  44.000 us]
            490.000 us 91.25%  taichi::lang::StructCompilerLLVM::generate_child_accessors [11 x  44.545 us]
                145.000 us 29.59%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [11 x  13.182 us]
                224.000 us 45.71%  taichi::lang::StructCompilerLLVM::generate_child_accessors [52 x   4.308 us]
                     28.000 us 12.50%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  14.000 us]
                     57.000 us 25.45%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  28.500 us]
                         27.000 us 47.37%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  13.500 us]
                         23.000 us 40.35%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  11.500 us]
                             13.000 us 56.52%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  13.000 us]
                              3.000 us 13.04%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x   3.000 us]
                              7.000 us 30.43%  [unaccounted]
                          7.000 us 12.28%  [unaccounted]
                    139.000 us 62.05%  [unaccounted]
                121.000 us 24.69%  [unaccounted]
              3.000 us  0.56%  [unaccounted]
          6.423 ms  2.10%  taichi::lang::TaichiLLVMContext::clone_struct_module [1 x   6.423 ms]
          1.098 ms  0.36%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [1 x   1.098 ms]
        211.172 ms 68.90%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [1 x 211.172 ms]
              5.972 ms  2.83%  llvm_function_pass    [1 x   5.972 ms]
            201.120 ms 95.24%  llvm_module_pass      [1 x 201.120 ms]
              4.080 ms  1.93%  [unaccounted]
         87.111 ms 28.42%  [unaccounted]
      6.417  m taichi::lang::Program::compile [27 x  14.260  s]
          6.363  m 99.16%  taichi::lang::irpass::compile_to_offloads [27 x  14.140  s]
              0.376  m  5.91%  taichi::lang::irpass::lower [27 x 835.509 ms]
              0.004  m  0.06%  taichi::lang::irpass::typecheck [108 x   2.018 ms]
              0.002  m  0.04%  taichi::lang::irpass::analysis::verify [432 x 342.380 us]
              0.000  m  0.00%  taichi::lang::irpass::loop_vectorize [27 x   4.000 us]
              0.000  m  0.00%  taichi::lang::irpass::vector_split [27 x   1.778 us]
              0.653  m 10.26%  taichi::lang::irpass::simplify [27 x   1.451  s]
                  0.012  s  0.03%  taichi::lang::irpass::typecheck [1418 x   8.760 us]
                 39.167  s 99.97%  [unaccounted]
              5.221  m 82.04%  taichi::lang::irpass::variable_optimization [54 x   5.801  s]
              0.000  m  0.00%  taichi::lang::irpass::flag_access [81 x  12.778 us]
              0.102  m  1.60%  taichi::lang::irpass::full_simplify [54 x 113.203 ms]
                  0.036  s  0.59%  taichi::lang::irpass::extract_constant [120 x 299.350 us]
                  0.017  s  0.29%  taichi::lang::irpass::binary_op_simplify [120 x 145.758 us]
                  0.270  s  4.42%  taichi::lang::irpass::constant_fold [120 x   2.253 ms]
                    221.781 ms 82.05%  taichi::lang::Program::compile [12 x  18.482 ms]
                          0.817 ms  0.37%  taichi::lang::irpass::compile_to_offloads [12 x  68.083 us]
                             35.000 us  4.28%  taichi::lang::irpass::lower [12 x   2.917 us]
                             70.000 us  8.57%  taichi::lang::irpass::typecheck [12 x   5.833 us]
                             82.000 us 10.04%  taichi::lang::irpass::analysis::verify [24 x   3.417 us]
                            569.000 us 69.65%  taichi::lang::irpass::offload [12 x  47.417 us]
                                 36.000 us  6.33%  taichi::lang::irpass::typecheck [24 x   1.500 us]
                                533.000 us 93.67%  [unaccounted]
                             61.000 us  7.47%  [unaccounted]
                        220.855 ms 99.58%  taichi::lang::KernelCodeGen::compile [12 x  18.405 ms]
                            220.843 ms 99.99%  taichi::lang::CodeGenCPU::codegen [12 x  18.404 ms]
                                 91.452 ms 41.41%  taichi::lang::TaichiLLVMContext::clone_struct_module [12 x   7.621 ms]
                                  0.012 ms  0.01%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [12 x 999.998 ns]
                                  0.749 ms  0.34%  taichi::lang::CodeGenLLVM::emit_to_module [12 x  62.417 us]
                                128.437 ms 58.16%  taichi::lang::CodeGenLLVM::compile_module_to_executable [12 x  10.703 ms]
                                     16.125 ms 12.55%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [12 x   1.344 ms]
                                     87.027 ms 67.76%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [12 x   7.252 ms]
                                         16.830 ms 19.34%  llvm_function_pass    [12 x   1.402 ms]
                                         49.402 ms 56.77%  llvm_module_pass      [12 x   4.117 ms]
                                         20.795 ms 23.89%  [unaccounted]
                                     25.285 ms 19.69%  [unaccounted]
                     48.534 ms 17.95%  [unaccounted]
                  0.025  s  0.40%  taichi::lang::irpass::alg_simp [120 x 204.183 us]
                  0.375  s  6.13%  taichi::lang::irpass::die [360 x   1.041 ms]
                  4.841  s 79.19%  taichi::lang::irpass::whole_kernel_cse [120 x  40.338 ms]
                  0.549  s  8.98%  taichi::lang::irpass::simplify [120 x   4.576 ms]
                     51.148 ms  9.31%  taichi::lang::irpass::typecheck [168 x 304.452 us]
                    497.966 ms 90.69%  [unaccounted]
              0.002  m  0.04%  taichi::lang::irpass::offload [27 x   5.042 ms]
                 10.515 ms  7.72%  taichi::lang::irpass::typecheck [54 x 194.722 us]
                125.624 ms 92.28%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::die [27 x 561.593 us]
              0.001  m  0.01%  taichi::lang::irpass::demote_atomics [27 x   1.846 ms]
                  9.067 ms 18.19%  taichi::lang::irpass::typecheck [27 x 335.815 us]
                 40.773 ms 81.81%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::cfg_optimization [27 x  24.481 us]
          0.054  m  0.84%  taichi::lang::KernelCodeGen::compile [27 x 120.303 ms]
              3.248  s 100.00%  taichi::lang::CodeGenCPU::codegen [27 x 120.301 ms]
                  0.198  s  6.09%  taichi::lang::TaichiLLVMContext::clone_struct_module [27 x   7.331 ms]
                  0.000  s  0.00%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [27 x 851.858 ns]
                  0.010  s  0.31%  taichi::lang::CodeGenLLVM::emit_to_module [27 x 374.333 us]
                  3.039  s 93.58%  taichi::lang::CodeGenLLVM::compile_module_to_executable [27 x 112.572 ms]
                      0.033  s  1.10%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [27 x   1.236 ms]
                      2.420  s 79.60%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [27 x  89.611 ms]
                          0.100  s  4.15%  llvm_function_pass    [27 x   3.721 ms]
                          2.262  s 93.48%  llvm_module_pass      [27 x  83.767 ms]
                          0.057  s  2.37%  [unaccounted]
                      0.587  s 19.30%  [unaccounted]
xumingkuan commented 4 years ago

Looks like #1248 will be very helpful because it replaces variable_optimization.

xumingkuan commented 4 years ago

After #1248 and with variable_optimization removed:

[I 06/16/20 21:20:54.410] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Initial IR:17653
[I 06/16/20 21:21:14.282] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Lowered:43334
[I 06/16/20 21:21:14.432] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Typechecked:44767
[I 06/16/20 21:21:14.452] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Loop Vectorized:44767
[I 06/16/20 21:21:14.471] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Loop Split:44767
[I 06/16/20 21:21:53.136] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Simplified I:14561
[I 06/16/20 21:21:53.150] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Dense struct-for demoted:14586
[I 06/16/20 21:21:53.155] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Access flagged I:14586
[T 06/16/20 21:21:53.220] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593713538630926
[T 06/16/20 21:21:53.242] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593713538630921
[T 06/16/20 21:21:53.262] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593711525365012
[T 06/16/20 21:21:53.284] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593713471258881
[T 06/16/20 21:21:53.305] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593713471258899
[T 06/16/20 21:21:53.332] [constant_fold.cpp:taichi::lang::ConstantFold::get_jit_evaluator_kernel@66] Saving JIT evaluator cache entry id=9539593713471258885
[I 06/16/20 21:21:55.815] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Simplified II:2557
[I 06/16/20 21:21:55.937] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Offloaded:3464
[I 06/16/20 21:21:55.968] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Optimized by CFG:3354
[I 06/16/20 21:21:55.970] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Access flagged II:3354
[I 06/16/20 21:21:55.979] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Access lowered:3814
[I 06/16/20 21:21:55.985] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] DIE:3698
[I 06/16/20 21:21:55.987] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Access flagged III:3698
[I 06/16/20 21:21:55.995] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Atomics demoted:3726
[I 06/16/20 21:21:56.842] [compile_to_offloads.cpp:taichi::lang::irpass::compile_to_offloads::<lambda_7865bac8c148bb46d1cabff90cffe3ba>::operator ()@22] Simplified III:2493
......
codegen_accessor_statements: 148.00
codegen_evaluator_statements: 102.00
codegen_kernel_statements: 29209.00
codegen_offloaded_tasks: 53.00
codegen_statements  : 29459.00
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[Profiler thread 13108]
      1.247  s taichi::lang::TaichiLLVMContext::clone_runtime_module [1 x   1.247  s]
          1.225  s 98.26%  taichi::lang::compile_runtime_bitcode [1 x   1.225  s]
          0.015  s  1.21%  taichi::lang::module_from_bitcode_file [1 x  15.074 ms]
          0.007  s  0.53%  clone module          [1 x   6.557 ms]
    338.915 ms taichi::lang::StructCompilerLLVM::run [1 x 338.915 ms]
          0.128 ms  0.04%  taichi::lang::StructCompilerLLVM::generate_types [69 x   1.855 us]
          0.544 ms  0.16%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x 544.000 us]
             58.000 us 10.66%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  58.000 us]
            481.000 us 88.42%  taichi::lang::StructCompilerLLVM::generate_child_accessors [11 x  43.727 us]
                184.000 us 38.25%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [11 x  16.727 us]
                252.000 us 52.39%  taichi::lang::StructCompilerLLVM::generate_child_accessors [52 x   4.846 us]
                     44.000 us 17.46%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  22.000 us]
                     60.000 us 23.81%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  30.000 us]
                         29.000 us 48.33%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  14.500 us]
                         24.000 us 40.00%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  12.000 us]
                             13.000 us 54.17%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  13.000 us]
                              3.000 us 12.50%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x   3.000 us]
                              8.000 us 33.33%  [unaccounted]
                          7.000 us 11.67%  [unaccounted]
                    148.000 us 58.73%  [unaccounted]
                 45.000 us  9.36%  [unaccounted]
              5.000 us  0.92%  [unaccounted]
          7.940 ms  2.34%  taichi::lang::TaichiLLVMContext::clone_struct_module [1 x   7.940 ms]
          1.210 ms  0.36%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [1 x   1.210 ms]
        223.976 ms 66.09%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [1 x 223.976 ms]
              6.245 ms  2.79%  llvm_function_pass    [1 x   6.245 ms]
            213.787 ms 95.45%  llvm_module_pass      [1 x 213.787 ms]
              3.944 ms  1.76%  [unaccounted]
        105.117 ms 31.02%  [unaccounted]
      1.234  m taichi::lang::Program::compile [27 x   2.742  s]
          1.181  m 95.68%  taichi::lang::irpass::compile_to_offloads [27 x   2.624  s]
              0.342  m 28.95%  taichi::lang::irpass::lower [27 x 759.481 ms]
              0.004  m  0.30%  taichi::lang::irpass::typecheck [108 x   1.947 ms]
              0.002  m  0.18%  taichi::lang::irpass::analysis::verify [405 x 313.553 us]
              0.000  m  0.00%  taichi::lang::irpass::loop_vectorize [27 x   3.556 us]
              0.000  m  0.00%  taichi::lang::irpass::vector_split [27 x   2.370 us]
              0.691  m 58.53%  taichi::lang::irpass::simplify [27 x   1.536  s]
                  0.013  s  0.03%  taichi::lang::irpass::typecheck [1418 x   9.052 us]
                 41.448  s 99.97%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::flag_access [81 x  14.111 us]
              0.136  m 11.49%  taichi::lang::irpass::full_simplify [54 x 150.682 ms]
                  0.050  s  0.61%  taichi::lang::irpass::extract_constant [120 x 413.333 us]
                  0.044  s  0.53%  taichi::lang::irpass::binary_op_simplify [120 x 362.692 us]
                  0.357  s  4.39%  taichi::lang::irpass::constant_fold [120 x   2.975 ms]
                    238.157 ms 66.71%  taichi::lang::Program::compile [12 x  19.846 ms]
                          0.755 ms  0.32%  taichi::lang::irpass::compile_to_offloads [12 x  62.917 us]
                             35.000 us  4.64%  taichi::lang::irpass::lower [12 x   2.917 us]
                             53.000 us  7.02%  taichi::lang::irpass::typecheck [12 x   4.417 us]
                             96.000 us 12.72%  taichi::lang::irpass::analysis::verify [24 x   4.000 us]
                            540.000 us 71.52%  taichi::lang::irpass::offload [12 x  45.000 us]
                                 49.000 us  9.07%  taichi::lang::irpass::typecheck [24 x   2.042 us]
                                491.000 us 90.93%  [unaccounted]
                             31.000 us  4.11%  [unaccounted]
                        237.324 ms 99.65%  taichi::lang::KernelCodeGen::compile [12 x  19.777 ms]
                            237.308 ms 99.99%  taichi::lang::CodeGenCPU::codegen [12 x  19.776 ms]
                                101.796 ms 42.90%  taichi::lang::TaichiLLVMContext::clone_struct_module [12 x   8.483 ms]
                                  0.014 ms  0.01%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [12 x   1.167 us]
                                  0.950 ms  0.40%  taichi::lang::CodeGenLLVM::emit_to_module [12 x  79.167 us]
                                134.339 ms 56.61%  taichi::lang::CodeGenLLVM::compile_module_to_executable [12 x  11.195 ms]
                                     17.506 ms 13.03%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [12 x   1.459 ms]
                                     91.557 ms 68.15%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [12 x   7.630 ms]
                                         18.357 ms 20.05%  llvm_function_pass    [12 x   1.530 ms]
                                         51.436 ms 56.18%  llvm_module_pass      [12 x   4.286 ms]
                                         21.764 ms 23.77%  [unaccounted]
                                     25.276 ms 18.82%  [unaccounted]
                    118.852 ms 33.29%  [unaccounted]
                  0.051  s  0.63%  taichi::lang::irpass::alg_simp [120 x 425.642 us]
                  1.575  s 19.35%  taichi::lang::irpass::die [360 x   4.374 ms]
                  5.238  s 64.37%  taichi::lang::irpass::whole_kernel_cse [120 x  43.649 ms]
                  0.822  s 10.11%  taichi::lang::irpass::simplify [120 x   6.854 ms]
                    120.181 ms 14.61%  taichi::lang::irpass::typecheck [168 x 715.363 us]
                    702.310 ms 85.39%  [unaccounted]
              0.003  m  0.23%  taichi::lang::irpass::offload [27 x   5.940 ms]
                 16.615 ms 10.36%  taichi::lang::irpass::typecheck [54 x 307.685 us]
                143.764 ms 89.64%  [unaccounted]
              0.001  m  0.09%  taichi::lang::irpass::cfg_optimization [27 x   2.352 ms]
                  0.118 ms  0.19%  taichi::lang::ControlFlowGraph::unreachable_code_elimination [32 x   3.688 us]
                 62.081 ms 97.74%  taichi::lang::ControlFlowGraph::store_to_load_forwarding [32 x   1.940 ms]
                     45.799 ms 73.77%  taichi::lang::ControlFlowGraph::reaching_definition_analysis [32 x   1.431 ms]
                     16.282 ms 26.23%  [unaccounted]
                  1.315 ms  2.07%  [unaccounted]
              0.000  m  0.02%  taichi::lang::irpass::die [27 x 596.222 us]
              0.001  m  0.06%  taichi::lang::irpass::demote_atomics [27 x   1.651 ms]
                 12.181 ms 27.33%  taichi::lang::irpass::typecheck [27 x 451.148 us]
                 32.394 ms 72.67%  [unaccounted]
          0.053  m  4.31%  taichi::lang::KernelCodeGen::compile [27 x 118.297 ms]
              3.194  s 100.00%  taichi::lang::CodeGenCPU::codegen [27 x 118.295 ms]
                  0.221  s  6.91%  taichi::lang::TaichiLLVMContext::clone_struct_module [27 x   8.180 ms]
                  0.000  s  0.00%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [27 x 925.926 ns]
                  0.011  s  0.33%  taichi::lang::CodeGenLLVM::emit_to_module [27 x 393.185 us]
                  2.962  s 92.73%  taichi::lang::CodeGenLLVM::compile_module_to_executable [27 x 109.697 ms]
                      0.037  s  1.25%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [27 x   1.376 ms]
                      2.316  s 78.18%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [27 x  85.764 ms]
                          0.097  s  4.20%  llvm_function_pass    [27 x   3.606 ms]
                          2.158  s 93.18%  llvm_module_pass      [27 x  79.912 ms]
                          0.061  s  2.62%  [unaccounted]
                      0.609  s 20.56%  [unaccounted]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
xumingkuan commented 4 years ago

Before #1324:

codegen_accessor_statements: 148.00
codegen_evaluator_statements: 102.00
codegen_kernel_statements: 29087.00
codegen_offloaded_tasks: 53.00
codegen_statements  : 29337.00
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[Profiler thread 5820]
      1.252  s taichi::lang::TaichiLLVMContext::clone_runtime_module [1 x   1.252  s]
          1.229  s 98.22%  taichi::lang::compile_runtime_bitcode [1 x   1.229  s]
          0.015  s  1.19%  taichi::lang::module_from_bitcode_file [1 x  14.925 ms]
          0.007  s  0.58%  clone module          [1 x   7.305 ms]
    309.048 ms taichi::lang::StructCompilerLLVM::run [1 x 309.048 ms]
          0.129 ms  0.04%  taichi::lang::StructCompilerLLVM::generate_types [69 x   1.870 us]
          0.574 ms  0.19%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x 574.000 us]
             52.000 us  9.06%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  52.000 us]
            516.000 us 89.90%  taichi::lang::StructCompilerLLVM::generate_child_accessors [11 x  46.909 us]
                173.000 us 33.53%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [11 x  15.727 us]
                292.000 us 56.59%  taichi::lang::StructCompilerLLVM::generate_child_accessors [52 x   5.615 us]
                     31.000 us 10.62%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  15.500 us]
                     59.000 us 20.21%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  29.500 us]
                         29.000 us 49.15%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  14.500 us]
                         23.000 us 38.98%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  11.500 us]
                             14.000 us 60.87%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  14.000 us]
                              3.000 us 13.04%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x   3.000 us]
                              6.000 us 26.09%  [unaccounted]
                          7.000 us 11.86%  [unaccounted]
                    202.000 us 69.18%  [unaccounted]
                 51.000 us  9.88%  [unaccounted]
              6.000 us  1.05%  [unaccounted]
          6.635 ms  2.15%  taichi::lang::TaichiLLVMContext::clone_struct_module [1 x   6.635 ms]
          1.174 ms  0.38%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [1 x   1.174 ms]
        211.535 ms 68.45%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [1 x 211.535 ms]
              6.357 ms  3.01%  llvm_function_pass    [1 x   6.357 ms]
            201.030 ms 95.03%  llvm_module_pass      [1 x 201.030 ms]
              4.148 ms  1.96%  [unaccounted]
         89.001 ms 28.80%  [unaccounted]
      6.872  m taichi::lang::Program::compile [27 x  15.272  s]
          6.821  m 99.25%  taichi::lang::irpass::compile_to_offloads [27 x  15.157  s]
              0.322  m  4.72%  taichi::lang::irpass::lower [27 x 716.143 ms]
              0.003  m  0.05%  taichi::lang::irpass::typecheck [108 x   1.797 ms]
              0.002  m  0.03%  taichi::lang::irpass::analysis::verify [486 x 248.510 us]
              0.000  m  0.00%  taichi::lang::irpass::loop_vectorize [27 x   3.519 us]
              0.000  m  0.00%  taichi::lang::irpass::vector_split [27 x   2.000 us]
              0.661  m  9.69%  taichi::lang::irpass::simplify [27 x   1.468  s]
                  0.013  s  0.03%  taichi::lang::irpass::typecheck [1394 x   9.110 us]
                 39.632  s 99.97%  [unaccounted]
              0.004  m  0.06%  taichi::lang::irpass::constant_fold [54 x   4.337 ms]
                158.625 ms 67.74%  taichi::lang::Program::compile [8 x  19.828 ms]
                      0.569 ms  0.36%  taichi::lang::irpass::compile_to_offloads [8 x  71.125 us]
                         38.000 us  6.68%  taichi::lang::irpass::lower [8 x   4.750 us]
                         26.000 us  4.57%  taichi::lang::irpass::typecheck [8 x   3.250 us]
                         82.000 us 14.41%  taichi::lang::irpass::analysis::verify [16 x   5.125 us]
                        385.000 us 67.66%  taichi::lang::irpass::offload [8 x  48.125 us]
                             28.000 us  7.27%  taichi::lang::irpass::typecheck [16 x   1.750 us]
                            357.000 us 92.73%  [unaccounted]
                         38.000 us  6.68%  [unaccounted]
                    157.981 ms 99.59%  taichi::lang::KernelCodeGen::compile [8 x  19.748 ms]
                        157.973 ms 99.99%  taichi::lang::CodeGenCPU::codegen [8 x  19.747 ms]
                             70.329 ms 44.52%  taichi::lang::TaichiLLVMContext::clone_struct_module [8 x   8.791 ms]
                              0.010 ms  0.01%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [8 x   1.250 us]
                              0.488 ms  0.31%  taichi::lang::CodeGenLLVM::emit_to_module [8 x  61.000 us]
                             87.031 ms 55.09%  taichi::lang::CodeGenLLVM::compile_module_to_executable [8 x  10.879 ms]
                                 11.816 ms 13.58%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [8 x   1.477 ms]
                                 58.473 ms 67.19%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [8 x   7.309 ms]
                                     12.044 ms 20.60%  llvm_function_pass    [8 x   1.505 ms]
                                     31.889 ms 54.54%  llvm_module_pass      [8 x   3.986 ms]
                                     14.540 ms 24.87%  [unaccounted]
                                 16.742 ms 19.24%  [unaccounted]
                 75.556 ms 32.26%  [unaccounted]
              0.028  m  0.41%  taichi::lang::irpass::cfg_optimization [81 x  20.926 ms]
                  1.649  s 97.27%  taichi::lang::ControlFlowGraph::store_to_load_forwarding [92 x  17.922 ms]
                      1.477  s 89.58%  taichi::lang::ControlFlowGraph::reaching_definition_analysis [92 x  16.055 ms]
                      0.172  s 10.42%  [unaccounted]
                  0.046  s  2.73%  [unaccounted]
              5.709  m 83.70%  taichi::lang::irpass::variable_optimization [54 x   6.344  s]
              0.000  m  0.00%  taichi::lang::irpass::flag_access [81 x   8.938 us]
              0.087  m  1.28%  taichi::lang::irpass::full_simplify [54 x  97.171 ms]
                  0.076  s  1.45%  taichi::lang::irpass::extract_constant [116 x 656.534 us]
                  0.002  s  0.03%  taichi::lang::irpass::unreachable_code_elimination [116 x  15.784 us]
                  0.016  s  0.30%  taichi::lang::irpass::binary_op_simplify [116 x 134.871 us]
                  0.093  s  1.77%  taichi::lang::irpass::constant_fold [116 x 802.888 us]
                     74.966 ms 80.49%  taichi::lang::Program::compile [4 x  18.741 ms]
                          0.227 ms  0.30%  taichi::lang::irpass::compile_to_offloads [4 x  56.750 us]
                             12.000 us  5.29%  taichi::lang::irpass::lower [4 x   3.000 us]
                             15.000 us  6.61%  taichi::lang::irpass::typecheck [4 x   3.750 us]
                             18.000 us  7.93%  taichi::lang::irpass::analysis::verify [8 x   2.250 us]
                            171.000 us 75.33%  taichi::lang::irpass::offload [4 x  42.750 us]
                                 13.000 us  7.60%  taichi::lang::irpass::typecheck [8 x   1.625 us]
                                158.000 us 92.40%  [unaccounted]
                             11.000 us  4.85%  [unaccounted]
                         74.704 ms 99.65%  taichi::lang::KernelCodeGen::compile [4 x  18.676 ms]
                             74.700 ms 99.99%  taichi::lang::CodeGenCPU::codegen [4 x  18.675 ms]
                                 31.162 ms 41.72%  taichi::lang::TaichiLLVMContext::clone_struct_module [4 x   7.790 ms]
                                  0.018 ms  0.02%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [4 x   4.500 us]
                                  0.237 ms  0.32%  taichi::lang::CodeGenLLVM::emit_to_module [4 x  59.250 us]
                                 43.236 ms 57.88%  taichi::lang::CodeGenLLVM::compile_module_to_executable [4 x  10.809 ms]
                                      5.949 ms 13.76%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [4 x   1.487 ms]
                                     29.119 ms 67.35%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [4 x   7.280 ms]
                                          6.078 ms 20.87%  llvm_function_pass    [4 x   1.520 ms]
                                         15.027 ms 51.61%  llvm_module_pass      [4 x   3.757 ms]
                                          8.014 ms 27.52%  [unaccounted]
                                      8.168 ms 18.89%  [unaccounted]
                     18.169 ms 19.51%  [unaccounted]
                  0.023  s  0.45%  taichi::lang::irpass::alg_simp [116 x 201.629 us]
                  0.151  s  2.88%  taichi::lang::irpass::die [348 x 434.517 us]
                  4.345  s 82.81%  taichi::lang::irpass::whole_kernel_cse [116 x  37.460 ms]
                  0.540  s 10.29%  taichi::lang::irpass::simplify [116 x   4.656 ms]
                     53.016 ms  9.82%  taichi::lang::irpass::typecheck [168 x 315.571 us]
                    487.132 ms 90.18%  [unaccounted]
              0.003  m  0.04%  taichi::lang::irpass::offload [27 x   5.755 ms]
                 10.989 ms  7.07%  taichi::lang::irpass::typecheck [54 x 203.500 us]
                144.385 ms 92.93%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::make_thread_local [27 x 250.519 us]
                  6.071 ms 89.75%  taichi::lang::irpass::typecheck [27 x 224.852 us]
                  0.693 ms 10.25%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::die [27 x 341.815 us]
              0.001  m  0.01%  taichi::lang::irpass::demote_atomics [27 x   1.125 ms]
                  8.643 ms 28.44%  taichi::lang::irpass::typecheck [27 x 320.111 us]
                 21.744 ms 71.56%  [unaccounted]
          0.052  m  0.75%  taichi::lang::KernelCodeGen::compile [27 x 114.873 ms]
              3.102  s 100.00%  taichi::lang::CodeGenCPU::codegen [27 x 114.872 ms]
                  0.230  s  7.40%  taichi::lang::TaichiLLVMContext::clone_struct_module [27 x   8.504 ms]
                  0.000  s  0.00%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [27 x 925.929 ns]
                  0.011  s  0.34%  taichi::lang::CodeGenLLVM::emit_to_module [27 x 395.111 us]
                  2.861  s 92.23%  taichi::lang::CodeGenLLVM::compile_module_to_executable [27 x 105.946 ms]
                      0.037  s  1.29%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [27 x   1.368 ms]
                      2.233  s 78.05%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [27 x  82.695 ms]
                          0.093  s  4.16%  llvm_function_pass    [27 x   3.438 ms]
                          2.081  s 93.20%  llvm_module_pass      [27 x  77.072 ms]
                          0.059  s  2.64%  [unaccounted]
                      0.591  s 20.65%  [unaccounted]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

After #1324:

codegen_accessor_statements: 148.00
codegen_evaluator_statements: 102.00
codegen_kernel_statements: 28573.00
codegen_offloaded_tasks: 53.00
codegen_statements  : 28823.00
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[Profiler thread 12560]
      1.237  s taichi::lang::TaichiLLVMContext::clone_runtime_module [1 x   1.237  s]
          1.215  s 98.20%  taichi::lang::compile_runtime_bitcode [1 x   1.215  s]
          0.016  s  1.28%  taichi::lang::module_from_bitcode_file [1 x  15.874 ms]
          0.006  s  0.51%  clone module          [1 x   6.299 ms]
    310.309 ms taichi::lang::StructCompilerLLVM::run [1 x 310.309 ms]
          0.124 ms  0.04%  taichi::lang::StructCompilerLLVM::generate_types [69 x   1.797 us]
          0.556 ms  0.18%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x 556.000 us]
             51.000 us  9.17%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  51.000 us]
            499.000 us 89.75%  taichi::lang::StructCompilerLLVM::generate_child_accessors [11 x  45.364 us]
                168.000 us 33.67%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [11 x  15.273 us]
                291.000 us 58.32%  taichi::lang::StructCompilerLLVM::generate_child_accessors [52 x   5.596 us]
                     28.000 us  9.62%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  14.000 us]
                     75.000 us 25.77%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  37.500 us]
                         29.000 us 38.67%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [2 x  14.500 us]
                         39.000 us 52.00%  taichi::lang::StructCompilerLLVM::generate_child_accessors [2 x  19.500 us]
                             14.000 us 35.90%  taichi::lang::StructCompilerLLVM::generate_refine_coordinates [1 x  14.000 us]
                             19.000 us 48.72%  taichi::lang::StructCompilerLLVM::generate_child_accessors [1 x  19.000 us]
                              6.000 us 15.38%  [unaccounted]
                          7.000 us  9.33%  [unaccounted]
                    188.000 us 64.60%  [unaccounted]
                 40.000 us  8.02%  [unaccounted]
              6.000 us  1.08%  [unaccounted]
          7.300 ms  2.35%  taichi::lang::TaichiLLVMContext::clone_struct_module [1 x   7.300 ms]
          1.572 ms  0.51%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [1 x   1.572 ms]
        211.025 ms 68.00%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [1 x 211.025 ms]
              5.987 ms  2.84%  llvm_function_pass    [1 x   5.987 ms]
            201.273 ms 95.38%  llvm_module_pass      [1 x 201.273 ms]
              3.765 ms  1.78%  [unaccounted]
         89.732 ms 28.92%  [unaccounted]
      1.164  m taichi::lang::Program::compile [27 x   2.587  s]
          1.113  m 95.60%  taichi::lang::irpass::compile_to_offloads [27 x   2.473  s]
              0.319  m 28.64%  taichi::lang::irpass::lower [27 x 708.401 ms]
              0.003  m  0.28%  taichi::lang::irpass::typecheck [108 x   1.711 ms]
              0.002  m  0.17%  taichi::lang::irpass::analysis::verify [459 x 251.102 us]
              0.000  m  0.00%  taichi::lang::irpass::loop_vectorize [27 x   3.519 us]
              0.000  m  0.00%  taichi::lang::irpass::vector_split [27 x   1.667 us]
              0.656  m 58.93%  taichi::lang::irpass::simplify [27 x   1.458  s]
                  0.012  s  0.03%  taichi::lang::irpass::typecheck [1394 x   8.957 us]
                 39.342  s 99.97%  [unaccounted]
              0.004  m  0.34%  taichi::lang::irpass::constant_fold [54 x   4.167 ms]
                152.001 ms 67.56%  taichi::lang::Program::compile [8 x  19.000 ms]
                      0.560 ms  0.37%  taichi::lang::irpass::compile_to_offloads [8 x  70.000 us]
                         39.000 us  6.96%  taichi::lang::irpass::lower [8 x   4.875 us]
                         55.000 us  9.82%  taichi::lang::irpass::typecheck [8 x   6.875 us]
                         62.000 us 11.07%  taichi::lang::irpass::analysis::verify [16 x   3.875 us]
                        385.000 us 68.75%  taichi::lang::irpass::offload [8 x  48.125 us]
                             43.000 us 11.17%  taichi::lang::irpass::typecheck [16 x   2.687 us]
                            342.000 us 88.83%  [unaccounted]
                         19.000 us  3.39%  [unaccounted]
                    151.376 ms 99.59%  taichi::lang::KernelCodeGen::compile [8 x  18.922 ms]
                        151.371 ms 100.00%  taichi::lang::CodeGenCPU::codegen [8 x  18.921 ms]
                             61.531 ms 40.65%  taichi::lang::TaichiLLVMContext::clone_struct_module [8 x   7.691 ms]
                              0.006 ms  0.00%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [8 x 750.006 ns]
                              0.513 ms  0.34%  taichi::lang::CodeGenLLVM::emit_to_module [8 x  64.125 us]
                             89.205 ms 58.93%  taichi::lang::CodeGenLLVM::compile_module_to_executable [8 x  11.151 ms]
                                 12.008 ms 13.46%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [8 x   1.501 ms]
                                 60.649 ms 67.99%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [8 x   7.581 ms]
                                     12.290 ms 20.26%  llvm_function_pass    [8 x   1.536 ms]
                                     33.121 ms 54.61%  llvm_module_pass      [8 x   4.140 ms]
                                     15.238 ms 25.12%  [unaccounted]
                                 16.548 ms 18.55%  [unaccounted]
                 72.992 ms 32.44%  [unaccounted]
              0.045  m  4.06%  taichi::lang::irpass::cfg_optimization [81 x  33.494 ms]
                  1.698  s 62.57%  taichi::lang::ControlFlowGraph::store_to_load_forwarding [93 x  18.254 ms]
                      1.530  s 90.11%  taichi::lang::ControlFlowGraph::reaching_definition_analysis [93 x  16.449 ms]
                      0.168  s  9.89%  [unaccounted]
                  0.615  s 22.68%  taichi::lang::ControlFlowGraph::dead_store_elimination [93 x   6.615 ms]
                    559.950 ms 91.02%  taichi::lang::ControlFlowGraph::live_variable_analysis [93 x   6.021 ms]
                     55.226 ms  8.98%  [unaccounted]
                  0.349  s 12.86%  taichi::lang::irpass::die [81 x   4.306 ms]
                  0.051  s  1.89%  [unaccounted]
              0.000  m  0.00%  taichi::lang::irpass::flag_access [81 x   9.321 us]
              0.081  m  7.24%  taichi::lang::irpass::full_simplify [54 x  89.528 ms]
                  0.044  s  0.91%  taichi::lang::irpass::extract_constant [116 x 378.776 us]
                  0.002  s  0.03%  taichi::lang::irpass::unreachable_code_elimination [116 x  14.405 us]
                  0.006  s  0.12%  taichi::lang::irpass::binary_op_simplify [116 x  50.776 us]
                  0.087  s  1.79%  taichi::lang::irpass::constant_fold [116 x 747.431 us]
                     78.767 ms 90.85%  taichi::lang::Program::compile [4 x  19.692 ms]
                          0.211 ms  0.27%  taichi::lang::irpass::compile_to_offloads [4 x  52.750 us]
                             12.000 us  5.69%  taichi::lang::irpass::lower [4 x   3.000 us]
                             11.000 us  5.21%  taichi::lang::irpass::typecheck [4 x   2.750 us]
                             33.000 us 15.64%  taichi::lang::irpass::analysis::verify [8 x   4.125 us]
                            148.000 us 70.14%  taichi::lang::irpass::offload [4 x  37.000 us]
                                 17.000 us 11.49%  taichi::lang::irpass::typecheck [8 x   2.125 us]
                                131.000 us 88.51%  [unaccounted]
                              7.000 us  3.32%  [unaccounted]
                         78.513 ms 99.68%  taichi::lang::KernelCodeGen::compile [4 x  19.628 ms]
                             78.510 ms 100.00%  taichi::lang::CodeGenCPU::codegen [4 x  19.628 ms]
                                 37.473 ms 47.73%  taichi::lang::TaichiLLVMContext::clone_struct_module [4 x   9.368 ms]
                                  0.006 ms  0.01%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [4 x   1.500 us]
                                  0.231 ms  0.29%  taichi::lang::CodeGenLLVM::emit_to_module [4 x  57.750 us]
                                 40.750 ms 51.90%  taichi::lang::CodeGenLLVM::compile_module_to_executable [4 x  10.188 ms]
                                      4.996 ms 12.26%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [4 x   1.249 ms]
                                     26.845 ms 65.88%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [4 x   6.711 ms]
                                          5.644 ms 21.02%  llvm_function_pass    [4 x   1.411 ms]
                                         14.544 ms 54.18%  llvm_module_pass      [4 x   3.636 ms]
                                          6.657 ms 24.80%  [unaccounted]
                                      8.909 ms 21.86%  [unaccounted]
                      7.935 ms  9.15%  [unaccounted]
                  0.012  s  0.25%  taichi::lang::irpass::alg_simp [116 x 105.690 us]
                  0.047  s  0.97%  taichi::lang::irpass::die [348 x 134.118 us]
                  4.115  s 85.11%  taichi::lang::irpass::whole_kernel_cse [116 x  35.472 ms]
                  0.522  s 10.80%  taichi::lang::irpass::simplify [116 x   4.502 ms]
                     47.326 ms  9.06%  taichi::lang::irpass::typecheck [168 x 281.702 us]
                    474.950 ms 90.94%  [unaccounted]
              0.003  m  0.24%  taichi::lang::irpass::offload [27 x   5.995 ms]
                 11.066 ms  6.84%  taichi::lang::irpass::typecheck [54 x 204.926 us]
                150.792 ms 93.16%  [unaccounted]
              0.000  m  0.01%  taichi::lang::irpass::make_thread_local [27 x 211.111 us]
                  5.056 ms 88.70%  taichi::lang::irpass::typecheck [27 x 187.259 us]
                  0.644 ms 11.30%  [unaccounted]
              0.000  m  0.01%  taichi::lang::irpass::die [27 x 322.444 us]
              0.000  m  0.04%  taichi::lang::irpass::demote_atomics [27 x   1.095 ms]
                  8.451 ms 28.58%  taichi::lang::irpass::typecheck [27 x 313.000 us]
                 21.116 ms 71.42%  [unaccounted]
          0.051  m  4.40%  taichi::lang::KernelCodeGen::compile [27 x 113.734 ms]
              3.071  s 100.00%  taichi::lang::CodeGenCPU::codegen [27 x 113.732 ms]
                  0.239  s  7.80%  taichi::lang::TaichiLLVMContext::clone_struct_module [27 x   8.870 ms]
                  0.000  s  0.00%  taichi::lang::CodeGenLLVMCPU::CodeGenLLVMCPU [27 x 814.808 ns]
                  0.010  s  0.33%  taichi::lang::CodeGenLLVM::emit_to_module [27 x 380.407 us]
                  2.820  s 91.85%  taichi::lang::CodeGenLLVM::compile_module_to_executable [27 x 104.461 ms]
                      0.035  s  1.22%  taichi::lang::TaichiLLVMContext::eliminate_unused_functions [27 x   1.278 ms]
                      2.210  s 78.36%  taichi::lang::JITSessionCPU::global_optimize_module_cpu [27 x  81.860 ms]
                          0.089  s  4.03%  llvm_function_pass    [27 x   3.303 ms]
                          2.063  s 93.34%  llvm_module_pass      [27 x  76.404 ms]
                          0.058  s  2.63%  [unaccounted]
                      0.576  s 20.41%  [unaccounted]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>