Closed hikettei closed 10 months ago
AddNode/SubNode
benchmark-accept-instructions
CL-WAFFE2-REPL> (benchmark-accept-instructions (compile-forward-and-backward (!softmax (randn `(300 300)))) :n-sample 100) Time(s) | Instruction ( * - Beyonds the average execution time) 0.002646 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID70924 <= op(TID70924(300 300) TID70840(300 300))> 9.5e-5 | <WfInst[Compiled: VIEWTENSORNODE-T] : TID70937 <= op(TID70937(300 300) TID70935(300 1))> 5.4e-5 | <WfInst[Compiled: VIEWTENSORNODE-T] : TID70883 <= op(TID70883(300 300) TID70881(300 1))> 1.59e-4 | <WfInst[Compiled: SCALARMUL-CPUTENSOR] : TID70843 <= op(TID70843(300 1) TID70845(1))> 4.9e-5 | <WfInst[Compiled: VIEWTENSORNODE-T] : TID70854 <= op(TID70854(300 300) TID70843(300 1))> 0.033278* | <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID70854 <= op(TID70854(300 300) TID70840(300 300))> 0.014609 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID70883 <= op(TID70883(300 300) TID70854(300 300))> 9.2e-5 | <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID70906 <= op(TID70906(1) TID70878(1))> 0.051683* | <WfInst[Compiled: SCALARDIV-CPUTENSOR] : TID70883 <= op(TID70883(300 300) TID70906(1))> 0.014387 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID70937 <= op(TID70937(300 300) TID70883(300 300))> 0.013283 | <WfInst[Compiled: SUBNODE-CPUTENSOR] : TID70924 <= op(TID70924(300 300) TID70937(300 300))> 0.002301 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID71028 <= op(TID71028(300 300) TID70924(300 300))> 0.089736* | <WfInst[Compiled: EXPNODE-LISPTENSOR] : TID71028 <= op(TID70924(300 300) TID71028(300 300))> 0.002164 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID71045 <= op(TID71045(300 300) TID71028(300 300))> 8.3e-5 | <WfInst[Compiled: VIEWTENSORNODE-T] : TID71058 <= op(TID71058(300 300) TID71056(300 1))> 1.68e-4 | <WfInst[Compiled: SCALARMUL-CPUTENSOR] : TID70994 <= op(TID70994(300 1) TID70996(1))> 4.3e-5 | <WfInst[Compiled: VIEWTENSORNODE-T] : TID71005 <= op(TID71005(300 300) TID70994(300 1))> 0.002178 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID70977 <= op(TID70977(300 300) TID70924(300 300))> 0.089359* | <WfInst[Compiled: EXPNODE-LISPTENSOR] : TID70977 <= op(TID70924(300 300) TID70977(300 300))> 0.033492* | <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID71005 <= op(TID71005(300 300) TID70977(300 300))> 0.014615 | <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID71058 <= op(TID71058(300 300) TID71005(300 300))> 0.014399 | <WfInst[Compiled: DIVNODE-CPUTENSOR] : TID71045 <= op(TID71045(300 300) TID71058(300 300))> 22 Instructions | 19 Tensors Instruction | Total time (s) (n-sample=100) <WfInst[Compiled: EXPNODE-LISPTENSOR] | 0.179095 <WfInst[Compiled: ADDNODE-CPUTENSOR] | 0.06677 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] | 0.052899994 <WfInst[Compiled: SCALARDIV-CPUTENSOR] | 0.051683 <WfInst[Compiled: DIVNODE-CPUTENSOR] | 0.014399 <WfInst[Compiled: SUBNODE-CPUTENSOR] | 0.013283 <WfInst[Compiled: SCALARMUL-CPUTENSOR] | 3.27e-4 <WfInst[Compiled: VIEWTENSORNODE-T] | 3.24e-4 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] | 9.2e-5
1. Fixed
AddNode/SubNode
produces a redundant copy2. Added
benchmark-accept-instructions