hikettei / cl-waffe2

[Experimental] Graph and Tensor Abstraction for Deep Learning all in Common Lisp
https://hikettei.github.io/cl-waffe2/
MIT License
122 stars 5 forks source link

[Refactor] Memory-Pool and Locality Optimization #109

Closed hikettei closed 9 months ago

hikettei commented 9 months ago

Changes

Memory-Locality Optimizing

As there is an assumption that InputTensor can be filled with non-zero, an optimisation was implemented to reconnect the computation nodes to reduce the extra cache. Now, Composing several !softmax function will require one additional space:

;; Memory-Size is 2x times smaller!
CL-WAFFE2-REPL> (disassemble-waffe2-ir
         (!softmax (!softmax (randn `(10 10)))))

disassemble-waffe2-ir:
 [Forward]: 
<WfInst[op=MOVETENSORNODE-CPUTENSOR] : TID6503 <= op(TID6503{float, (10 10)} <Input>TID6415{float, (10 10)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6455{float, (10 1)} TID6497{float, (10 1)})>
<WfInst[op=SCALARMUL-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6424{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=ADDNODE-CPUTENSOR]        : TID6497 <= op(TID6497{float, (10 10)} <Input>TID6415{float, (10 10)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 1)} TID6497{float, (10 10)})>
<WfInst[op=SCALARDIV-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6419{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=SUBNODE-CPUTENSOR]        : TID6503 <= op(TID6503{float, (10 10)} TID6497{float, (10 10)})>
<WfInst[op=MOVETENSORNODE-CPUTENSOR] : TID6584 <= op(TID6584{float, (10 10)} TID6503{float, (10 10)})>
<WfInst[op=EXPNODE-CPUTENSOR]        : TID6584 <= op(TID6503{float, (10 10)} TID6584{float, (10 10)})>
<WfInst[op=SCALARMUL-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6552{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=EXPNODE-CPUTENSOR]        : TID6503 <= op(TID6503{float, (10 10)} TID6503{float, (10 10)})>
<WfInst[op=ADDNODE-CPUTENSOR]        : TID6497 <= op(TID6497{float, (10 10)} TID6503{float, (10 10)})>
<WfInst[op=DIVNODE-CPUTENSOR]        : TID6584 <= op(TID6584{float, (10 10)} TID6497{float, (10 10)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6668{float, (10 1)} TID6497{float, (10 1)})>
<WfInst[op=SCALARMUL-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6637{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=ADDNODE-CPUTENSOR]        : TID6497 <= op(TID6497{float, (10 10)} TID6584{float, (10 10)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 1)} TID6497{float, (10 10)})>
<WfInst[op=SCALARDIV-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6632{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=SUBNODE-CPUTENSOR]        : TID6584 <= op(TID6584{float, (10 10)} TID6497{float, (10 10)})>
<WfInst[op=MOVETENSORNODE-CPUTENSOR] : TID6503 <= op(TID6503{float, (10 10)} TID6584{float, (10 10)})>
<WfInst[op=EXPNODE-CPUTENSOR]        : TID6503 <= op(TID6584{float, (10 10)} TID6503{float, (10 10)})>
<WfInst[op=SCALARMUL-CPUTENSOR]      : TID6497 <= op(TID6497{float, (10 1)} <Input>TID6765{float, (1)})>
<WfInst[op=VIEWTENSORNODE-T]         : TID6497 <= op(TID6497{float, (10 10)} TID6497{float, (10 1)})>
<WfInst[op=EXPNODE-CPUTENSOR]        : TID6584 <= op(TID6584{float, (10 10)} TID6584{float, (10 10)})>
<WfInst[op=ADDNODE-CPUTENSOR]        : TID6497 <= op(TID6497{float, (10 10)} TID6584{float, (10 10)})>
<WfInst[op=DIVNODE-CPUTENSOR]        : TID6503 <= op(TID6503{float, (10 10)} TID6497{float, (10 10)})>

31 Instructions | 6 Tensors | 6 Scalars

[Update] Static-Allocation

[Update] Thread-Safe defmodel-as

Minor changes

CL-WAFFE2-REPL> (build (!sin (randn `(3 3))))

<Compiled-Composite(allocated-p=NIL)
    forward     : forward(model) -> CPUTENSOR{FLOAT}(3 3)
    backward    : backward(model) -> t
    memory-pool : one tensor(s)
                   L {3.6e-5}MB
>