Open nathanaelbosch opened 10 months ago
A quick benchmark I did:
using Kronecker, LinearMaps, BenchmarkTools, LinearAlgebra
import ProbNumDiffEq as PNDE
D = 100
K1 = Kronecker.kronecker(rand(D, D), rand(D, D));
K2 = kron(LinearMaps.WrappedMap(K1.A), LinearMaps.WrappedMap(K1.B));
K = Matrix(K1);
X = rand(D*D, D*D);
out = similar(X);
@assert mul!(out, K1, X) == mul!(out, K2, X)
@btime mul!($out, $K, $X); 7.464 s (0 allocations: 0 bytes)
@btime mul!($out, $K1, $X); 730.223 ms (20000 allocations: 763.40 MiB)
@btime mul!($out, $K2, $X); 736.059 ms (20000 allocations: 763.40 MiB)
# Now make the left factor a UniformScaling
K1 = Kronecker.kronecker(I(D), rand(D, D));
K2 = kron(LinearMaps.UniformScalingMap(true, D), LinearMaps.WrappedMap(K1.B));
K3 = PNDE.IsometricKroneckerProduct(D, K1.B);
K = Matrix(K1);
X = rand(D*D, D*D);
out = similar(X);
@assert mul!(out, K1, X) == mul!(out, K2, X) == mul!(out, K3, X)
@btime mul!($out, $K, $X); 8.361 s (0 allocations: 0 bytes)
@btime mul!($out, $K1, $X); 473.537 ms (20000 allocations: 763.40 MiB)
@btime mul!($out, $K2, $X); 415.658 ms (0 allocations: 0 bytes)
@btime mul!($out, $K3, $X); 295.632 ms (0 allocations: 0 bytes)
# mine + Octavian.jl: 145.818 ms (5 allocations: 80 bytes)
So basically: LinearMaps.jl prevents allocations and is somewhat faster than Kronecker.jl, but the current custom implementation is much faster, so we shouldn't expect many gains from this in this regard. But implementing $H, A(h), Q(h)$ might still be worthwhile.
Using LinearMaps.jl could have some advantages:
IsometricKroneckerProduct