Allocation - Githubissues

mabuni1998 commented 1 year ago

Hi Stefan,

Can you help me understand why the QuantumOpticsBase._tp_sum_matmul! makes allocations?

At some point it makes a call to: tmp = _tp_sum_get_tmp(op1..., b_data, :_tp_sum_matmul_tmp1) As I understand it, this should avoid allocations if possible. However, when our code is run, it does not...

It eventually, calls:

function _tp_matmul_get_tmp(::Type{T}, shp::NTuple{N,Int}, sym) where {T,N}
    len = prod(shp)
    use_cache = lazytensor_use_cache()
    key = (sym, taskid(), UInt(len), T)
    if use_cache && Vector{T} <: LazyTensorCacheable
        cached = get!(lazytensor_cache, key) do
            Vector{T}(undef, len)
        end
        # Let's make sure the compiler knows we have the right type
        tmp = Vector{T}(cached)
    else
        tmp = Vector{T}(undef, len)
    end
    Base.ReshapedArray(tmp, shp, ())
end

where I'm guessing cached = get!(lazytensor_cache, key), is where it is supposed to use the cashed tmp vector instead of allocating a new one. However, running the above function I get allocations, and these are the main contributor to the 4GB of allocations during our solver. See the following:

julia>using BenchmarkTools
julia>using QuantumOptics
julia> T = ComplexF64
julia> shp = (2,122)
julia> @btime QuantumOpticsBase._tp_matmul_get_tmp(T,shp,:_tp_sum_matmul_tmp1)
  715.000 ns (3 allocations: 4.02 KiB)
2×122 reshape(::Vector{ComplexF64}, 2, 122) with eltype ComplexF64:
 4.24399e-314+1.35452e-314im       0.0+1.27683e-316im  5.0e-324+2.122e-314im  …  2.122e-314+2.12457e-314im  4.24399e-314+2.57819e-317im  4.4e-323+2.12557e-314im
 1.35448e-314+1.606e-321im    9.0e-323+6.36599e-314im       0.0+0.0im              5.0e-324+2.12457e-314im           0.0+6.0e-323im      7.4e-323+6.36599e-314im

Why does this happen? If we can make it work, then we should have solved the allocation problem.

mabuni1998 commented 1 year ago

Forgot to ping you. @Krastanov

Krastanov commented 1 year ago

hm, there seems to be something weird. Here is what I get:

using BenchmarkTools
using QuantumOptics
T = ComplexF64
shp = (2,122)
@btime QuantumOpticsBase._tp_matmul_get_tmp(T,shp,:_tp_sum_matmul_tmp1)

resulting in

244.396 ns (2 allocations: 80 bytes)

Which version of julia are you on? Could you give Pkg.status() and versioninfo()?

mabuni1998 commented 1 year ago

I get the following running Pkg.status()

julia> Pkg.status()
Status `C:\Users\mabun\OneDrive\Dokumenter\DTU\Speciale\scripts\open_quantum_optics\Project.toml`
  [6e4b80f9] BenchmarkTools v1.3.2
  [c4555495] CavityWaveguide v0.1.0 `CavityWaveguide`
  [8f4d0f93] Conda v1.7.0
⌃ [31a5f54b] Debugger v0.7.6
⌃ [0c46a032] DifferentialEquations v7.3.0
⌅ [a98d9a8b] Interpolations v0.13.6
  [033835bb] JLD2 v0.4.29
  [e7bfaba1] NumericalIntegration v0.3.3
  [d96e819e] Parameters v0.12.3
  [14b8a8f1] PkgTemplates v0.7.29
  [d330b81b] PyPlot v2.11.0
  [1fd47b50] QuadGK v2.6.0
⌃ [6e0679c1] QuantumOptics v1.0.5
⌃ [295af30f] Revise v3.4.0
  [37e2e46d] LinearAlgebra
  [44cfe95a] Pkg v1.8.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

and running versioninfo() I get:

versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × AMD Ryzen 5 3600 6-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 12 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

mabuni1998 commented 1 year ago

I updated QuantumOptics.jl and now allocation is done correctly.

qojulia / WaveguideQED.jl

Allocation #9