cl-waffe2 has established the following step to achieve DAG-specific acceleration.
1. [Constructing DAG networks by defnode/call/forward]
-> when called with build
2. [Applying a topological sort to the given forward/backward networks, later applying in-place mutation]
3. [Generated cl-waffe2 IR for forward/reverse mode]
4. [Compiles each nodes with using cache.lisp, to each rank/type/layout of matrices]
5. [If any, applying JITxxTensor devices] (Undone)
-> when called with proceed
2. [Applying In-place mutation within small overheads]
3. [Evaluates the computation node directly]
Parallelisation of call-with-view and FuseOps without JIT devices are future tasks. I am thinking of parallelising with lparallel instead of using OpenMP.
Changes
Added cl-waffe2 IR
A brand new cl-waffe2 VM makes compiling time 100x times faster by compiling networks created by
defnode
into cl-waffe2 IR.A CNN compilation would be completed within 0.05 seconds:
cl-waffe2 IR
The cl-waffe2 IR is a simple data structure of
A <- f(B C D)
where f is an operation and is represented by lambda functions.cl-waffe2 has established the following step to achieve DAG-specific acceleration.
Parallelisation of call-with-view and FuseOps without JIT devices are future tasks. I am thinking of parallelising with lparallel instead of using OpenMP.