is right now very slow on CPU. Much slower than matrix-vector product, Gibbs state computation, etc. This is what's holding tangent diffusion far behind BP in performance.
This is quite strange as in the 2nd example, p * V should be understood as rvalue. Even if we allocate a tensor pV for this intermediate result in the loop as in 1st example, it stays slow, although there shouldn't be any new memory allocation.
Computing the product of two fields, e.g.
is right now very slow on CPU. Much slower than matrix-vector product, Gibbs state computation, etc. This is what's holding tangent diffusion far behind BP in performance.
This is quite strange as in the 2nd example,
p * V
should be understood as rvalue. Even if we allocate a tensorpV
for this intermediate result in the loop as in 1st example, it stays slow, although there shouldn't be any new memory allocation.