hikettei / cl-waffe2

[Experimental] Graph and Tensor Abstraction for Deep Learning all in Common Lisp
https://hikettei.github.io/cl-waffe2/
MIT License
122 stars 5 forks source link

[WIP] 100x times faster compiling time with a brand new cl-waffe2 IR #72

Closed hikettei closed 11 months ago

hikettei commented 11 months ago

Changes

Introducing cl-waffe2 IR

Now, Compiled-Composite is working under cl-waffe2 VM which interprets cl-waffe2 IR.

CL-WAFFE2> (disassemble-waffe2-ir
        (cl-waffe2/nn:!relu (parameter (randn `(10 10)))))

== [disassemble-waffe2-ir: Forward] ======
<WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID7463.state <= apply( TID7463(10 10) <Param>TID7450(10 10) )>
<WfInst[Compiled: MoveTensorNode(SAVE_FOR_BACKWARD)] : TID7479.state <= apply( TID7479(10 10) TID7463(10 10) )>
<WfInst[Compiled: WHERE-OPERATION-NODE-LISPTENSOR] : TID7455.state <= apply( <Param>TID7450(10 10) TID7455(10 10) )>
<WfInst[Compiled: <DELETED>] : TID7471.state <= apply( TID7471(10 10) TID7455(10 10) )>
<WfInst[Compiled: MoveTensorNode(SAVE_FOR_BACKWARD)] : TID7487.state <= apply( TID7487(10 10) TID7471(10 10) )>
<WfInst[Compiled: MULNODE-LISPTENSOR] : TID7479.state <= apply( TID7479(10 10) TID7487(10 10) )>

== [disassemble-waffe2-ir: Backward] ======
<WfInst[Compiled: Block -> MULNODE-LISPTENSOR-BACKWARD {
        <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID7590.state <= apply( TID7590(10 10) TID7587(10 10) )>
        <WfInst[Compiled: MULNODE-LISPTENSOR] : TID7590.state <= apply( TID7590(10 10) TID7471(10 10) )>
        <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID7616.state <= apply( TID7616(10 10) TID7590(10 10) )>
    }
  ] : TID7528.state <= apply( TID7499(10 10) )>
<WfInst[Compiled: Block -> MOVETENSORNODE-CPUTENSOR-BACKWARD {
        <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID7579.state <= apply( TID7579(10 10) TID7576(10 10) )>
    }
  ] : TID7544.state <= apply( TID7528(10 10) )>
<WfInst[Compiled: Block -> MOVETENSORNODE-CPUTENSOR-BACKWARD {
        <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID7568.state <= apply( TID7568(10 10) TID7565(10 10) )>
    }
  ] : TID7552.state <= apply( TID7544(10 10) )>
<WfInst[Compiled: ADDNODE-CPUTENSOR] : TID7452.state <= apply( TID7452(10 10) TID7552(10 10) )>

cl-waffe2 has established the following step to achieve DAG-specific acceleration.

1.  [Constructing DAG networks by defnode/call/forward]
-> when called with build

2. [Applying a topological sort to the given forward/backward networks, later applying in-place mutation]

3. [Generated cl-waffe2 IR for forward/reverse mode]

4. [Compiles each nodes with using cache.lisp, to each rank/type/layout of matrices]

5. [If any, applying JITxxTensor devices] (Undone)

-> when called with proceed

2. [Applying In-place mutation within small overheads]

3. [Evaluates the computation node directly]

Things undone

  1. define-composite-node is no longer working with the new VM. Keep using old one?
  2. With according to the changes of IR, JITCPUTensor/JITLispTensor is no longer working. (with a few adjustments, it should be work.)
  3. It is possible to do FuseOps without JITxxTensor devices.
  4. multi-threading with call-with-view