[JITCPUTensor] compiles the entire C code and optimizes the loop even when offsets and permutations are too complicated that call-with-view can't optimize. This optimized not only arithmetic/move nodes but the layout, contributing to optimising Unfold(Conv2D, Pool2D)
My benchmark shows enough performance considering Im2Col is implemented in Lisp; within 0.9~1.5x times faster than PyTorch.
Changes
[JITCPUTensor] compiles the entire C code and optimizes the loop even when offsets and permutations are too complicated that call-with-view can't optimize. This optimized not only arithmetic/move nodes but the layout, contributing to optimising Unfold(Conv2D, Pool2D)
My benchmark shows enough performance considering
Im2Col
is implemented in Lisp; within 0.9~1.5x times faster than PyTorch.[Refactor] API Changes
[Enhancement] RepeatN
An elegant notation to repeat several models: