Field multiplication on CPU

Computing the product of two fields, e.g.

pV = p * V
# or
V -= K.sums(p * V)

is right now very slow on CPU. Much slower than matrix-vector product, Gibbs state computation, etc. This is what's holding tangent diffusion far behind BP in performance.

This is quite strange as in the 2nd example, p * V should be understood as rvalue. Even if we allocate a tensor pV for this intermediate result in the loop as in 1st example, it stays slow, although there shouldn't be any new memory allocation.

opeltre / topos

Field multiplication on CPU #20