LazySum of LazyTensor of sparse result in high alloc in integration

AmitRotem commented 1 year ago

Short example of two subsystems; LazySum of LazyTensor of sparse have memory allocations that scale with integration time, whereas LazySum of sparse has a constant number of allocations. Integration takes about 6 times as long for this 225 sized Hilbert space. Profiler showed some GC in _strides, but other than that all boxes are bluish.

using SparseArrays, QuantumOptics, Plots, BenchmarkTools
# setup
nn = 15
bas = NLevelBasis(nn)
ψ0 = randoperator(bas^2, NLevelBasis(2));
# Hamiltonians
Ht_sum_tensor_sparse, Ht_sum_sparse = let op, op1, op2
    # list of sparse operators
    op = [Operator(bas, sprandn(nn,nn,0.5)) for k=1:4]
    op.+= op
    # LazySum of LazyTensor of sparse operators
    op1 = LazySum(LazyTensor(bas^2, 1, op[1]),
        LazyTensor(bas^2, 2, op[2]),
        LazyTensor(bas^2, (1,2), Tuple(op[3:4])))
    function OP1(t,u)
        op1.factors[3] = Complex(sinpi(t)+sinpi(t/10)+sinpi(t/100)+sinpi(t/1000))
        op1
    end
    # LazySum of sparse operators
    op2 = LazySum(sparse.(op1.operators)...)
    function OP2(t,u)
        op2.factors[3] = Complex(sinpi(t)+sinpi(t/10)+sinpi(t/100)+sinpi(t/1000))
        op2
    end
    OP1, OP2
end;

# allocation of solver vs time span of solution
## LazySum of LazyTensor of sparse operators
_, ψT = timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0, Ht_sum_tensor_sparse); # compile
@time al_sts = [@allocated _, ψT = timeevolution.schroedinger_dynamic((0.0, Float64(k)), ψ0, Ht_sum_tensor_sparse) for k=1:5];
## LazySum of sparse operators
_, ψT = timeevolution.schroedinger_dynamic((0.0, 1.0), ψ0, Ht_sum_sparse); # compile
@time al_ss = [@allocated _, ψT = timeevolution.schroedinger_dynamic((0.0, Float64(k)), ψ0, Ht_sum_sparse) for k=1:5];

# plot
plot(al_sts; label="lazy sum of lazy tensor of sparse") # allocations is linear with time span
plot!(al_ss; label="lazy sum of sparse") # allocations is constant

tes1(n) = sum(timeevolution.schroedinger_dynamic((0.0, exp2(k)), randoperator(bas^2, NLevelBasis(2)), Ht_sum_tensor_sparse)[2][end] for k=range(0,10,n))
tes2(n) = sum(timeevolution.schroedinger_dynamic((0.0, exp2(k)), randoperator(bas^2, NLevelBasis(2)), Ht_sum_sparse)[2][end] for k=range(0,10,n))
tes1(2),tes2(2);

@time tes1(10); # about 6 times as long then tes2(10)
@time tes2(10);

@profview tes1(10); # shows some GC at `_strides` when creating a vector of zeros, but doesn't take a lot of time
@profview tes2(10);

amilsted commented 1 year ago

Seems we allocate shape and strides vectors every time we do a lazytensor sparse gemm. This is a holdover from the old lazytensor code. We can probably replace these vectors with tuples?

amilsted commented 11 months ago

Fixed

qojulia / QuantumOptics.jl

LazySum of LazyTensor of sparse result in high alloc in integration #352