`mul!()` of `LazyProduct` result in high alloc in integration

Lightup1 commented 2 years ago

function mul!(result::Ket{B1},a::LazyProduct{B1,B2},b::Ket{B2},alpha,beta) where {B1,B2}
    tmp1 = Ket(a.operators[end].basis_l)
    mul!(tmp1,a.operators[end],b,a.factor,0)
    for i=length(a.operators)-1:-1:2
        tmp2 = Ket(a.operators[i].basis_l)
        mul!(tmp2,a.operators[i],tmp1)
        tmp1 = tmp2
    end
    mul!(result,a.operators[1],tmp1,alpha,beta)
    return result
end

tmp1 and tmp2 can cause high alloc when the integration tspan is large.

Lightup1 commented 2 years ago

Speculate that adding fields of Ket, Bra, and combinedoperater may resolve the use of tmp, but it may make LazyProduct a little bit fat.

Lightup1 commented 2 years ago

As a test add KTL and BTL

mutable struct LazyProduct{BL,BR,F,T,KTL,BTR} <: AbstractOperator{BL,BR}
    basis_l::BL
    basis_r::BR
    factor::F
    operators::T
    ket_l::KTL
    bra_r::BTR
    function LazyProduct{BL,BR,F,T,KTL,BTR}(operators::T, factor::F=1) where {BL,BR,F,T,KTL,BTR}
        for i = 2:length(operators)
            check_multiplicable(operators[i-1], operators[i])
        end
        ket_l=[Ket(operator.basis_l) for operator in operators]
        bra_r=[Bra(operator.basis_r) for operator in operators]
        new(operators[1].basis_l, operators[end].basis_r, factor, operators,ket_l,bra_r)
    end
end
function LazyProduct(operators::T, factor::F=1) where {T,F}
    BL = typeof(operators[1].basis_l)
    BR = typeof(operators[end].basis_r)
    KTL = typeof([Ket(operator.basis_l) for operator in operators])
    BTR = typeof([Bra(operator.basis_r) for operator in operators])
    LazyProduct{BL,BR,F,T,KTL,BTR}(operators, factor)
end

function mul!(result::Ket{B1},a::LazyProduct{B1,B2},b::Ket{B2},alpha,beta) where {B1,B2}
    mul!(a.ket_l[end],a.operators[end],b,a.factor,0)
    for i=length(a.operators)-1:-1:2
        mul!(a.ket_l[i],a.operators[i],a.ket_l[i+1])
        # a.ket_l[i+1]=Ket(a.operators[i].basis_l)
    end
    mul!(result,a.operators[1],a.ket_l[2],alpha,beta)
    # a.ket_l[2]=Ket(a.operators[2].basis_l)
    return result
end

@benchmark QuantumOptics.mul!(dpsi,H_kin,Ψ0)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.740 μs …  4.260 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.780 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.795 μs ± 91.743 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▂ ▆█▆ ▃▂ ▂                                               ▂
  ▇██▁███▁██▁██▅▁▅▅▅▁▃▄▁▃▅▄▁▅▄▃▁▃▅▁▆▇█▁█▇▁▇▆▆▁▆▅▅▁▅▄▁▄▄▅▁▄▄▄ █
  1.74 μs      Histogram: log(frequency) by time     2.16 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

Now the alloc is irrelevant with the integration time @benchmark tout, Ψt = timeevolution.schroedinger($T, $Ψ0, $H)

BenchmarkTools.Trial: 272 samples with 1 evaluation.
 Range (min … max):  17.615 ms … 37.316 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.155 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   18.400 ms ±  1.428 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▁▃▅▃▅▇██▅▂ 
  ▆██████████▆▆▆▁▇▇▁▆▆▄▁▁▇█▄▁▄▁▆▁▁▁▁▁▁▁▁▁▆▄▄▁▁▁▁▁▁▁▁▁▄▁▁▁▁▄▄▄ ▆
  17.6 ms      Histogram: log(frequency) by time      21.9 ms <

 Memory estimate: 38.00 KiB, allocs estimate: 70.

david-pl commented 2 years ago

Yes, that mul! can be optimized. Caching the Ket and Bra like you do is a good idea in principle.

One issue here is that the data type of the Ket you multiply with is not conserved. What I mean is, that e.g. Ket(operator.basis_l) in your ket_l does not specifiy the data type of the data of the Ket, so Vector{ComplexF64} is assumed. This means, however, if you try to work with, e.g. Vector{ComplexF32}, the data type will not be conserved when multiplying such a Ket with a LazyProduct.

This issue is actually a bit tricky as it means you need to take into account the type of result from the mul! method for the cache of LazyProduct. So you cannot fully know this when constructing the LazyProduct. You could fill the cache in mul! directly, since you'd have all the info. Then you just need a fast way of checking if the cache has already been filled for the current data types involved in the mul! operation and don't reconstruct it if it's already valid.

That said, the current version of the code (on master) doesn't respect the data type either. So what you're doing is definitely an improvement, and we could leave the data type issue for another time. Would you like to open a PR with the changes you suggest above?

Lightup1 commented 2 years ago

OK, I’ll open a PR. But I’m new in programming, may take some time to know how to use GitHub PR.

Lightup1 commented 2 years ago

> git push -u origin EffientLazyProduct_mul
remote: Permission to qojulia/QuantumOptics.jl.git denied to Lightup1.
fatal: unable to access 'https://github.com/qojulia/QuantumOptics.jl.git/': The requested URL returned error: 403

After setup for a whole night, I find out that I have no permission.😂😂

david-pl commented 2 years ago

@Lightup1 that's not how you submit a PR... you need to fork the repository. See for example this guide: https://code.tutsplus.com/tutorials/how-to-collaborate-on-github--net-34267

Lightup1 commented 2 years ago

Oh, thank you. Got it!😅

amilsted commented 2 years ago

An alternative is to use caching to handle the tmp vectors, like in this PR. We could use the same cache in LazyProduct and LazyTensor (which is also a kind of lazy product operator).

I thought about using the approach described here, but it does seem a bit much to put all possible intermediate types (those for kets, bras, and operators!) in the operator struct.

A third way is to put a cache in the operator struct. That way only needed arrays get created and the memory is freed when the operator is garbage-collected.

Lightup1 commented 2 years ago

Thanks for your suggestions.

A third way is to put a cache in the operator struct. That way only needed arrays get created and the memory is freed when the operator is garbage-collected.

I wonder how we define the type of cache since it can be Ket or Operater with different basis.

amilsted commented 2 years ago

Yes, the typing issue is a little tricky, also because element types can vary, as @david-pl points out above. It may not be too terrible to use abstract types, such as an appropriate union, or even Any, as is used in the global cache of the PR I linked. There doesn't seem to be a significant performance hit in cases I have tested.

qojulia / QuantumOptics.jl

`mul!()` of `LazyProduct` result in high alloc in integration #333