Closed mabuni1998 closed 1 year ago
Forgot to ping you. @Krastanov
hm, there seems to be something weird. Here is what I get:
using BenchmarkTools
using QuantumOptics
T = ComplexF64
shp = (2,122)
@btime QuantumOpticsBase._tp_matmul_get_tmp(T,shp,:_tp_sum_matmul_tmp1)
resulting in
244.396 ns (2 allocations: 80 bytes)
Which version of julia are you on? Could you give Pkg.status()
and versioninfo()
?
I get the following running Pkg.status()
julia> Pkg.status()
Status `C:\Users\mabun\OneDrive\Dokumenter\DTU\Speciale\scripts\open_quantum_optics\Project.toml`
[6e4b80f9] BenchmarkTools v1.3.2
[c4555495] CavityWaveguide v0.1.0 `CavityWaveguide`
[8f4d0f93] Conda v1.7.0
⌃ [31a5f54b] Debugger v0.7.6
⌃ [0c46a032] DifferentialEquations v7.3.0
⌅ [a98d9a8b] Interpolations v0.13.6
[033835bb] JLD2 v0.4.29
[e7bfaba1] NumericalIntegration v0.3.3
[d96e819e] Parameters v0.12.3
[14b8a8f1] PkgTemplates v0.7.29
[d330b81b] PyPlot v2.11.0
[1fd47b50] QuadGK v2.6.0
⌃ [6e0679c1] QuantumOptics v1.0.5
⌃ [295af30f] Revise v3.4.0
[37e2e46d] LinearAlgebra
[44cfe95a] Pkg v1.8.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
and running versioninfo() I get:
versioninfo()
Julia Version 1.8.2
Commit 36034abf26 (2022-09-29 15:21 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 12 × AMD Ryzen 5 3600 6-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
Threads: 1 on 12 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
I updated QuantumOptics.jl and now allocation is done correctly.
Hi Stefan,
Can you help me understand why the QuantumOpticsBase._tp_sum_matmul! makes allocations?
At some point it makes a call to: tmp = _tp_sum_get_tmp(op1..., b_data, :_tp_sum_matmul_tmp1) As I understand it, this should avoid allocations if possible. However, when our code is run, it does not...
It eventually, calls:
where I'm guessing cached = get!(lazytensor_cache, key), is where it is supposed to use the cashed tmp vector instead of allocating a new one. However, running the above function I get allocations, and these are the main contributor to the 4GB of allocations during our solver. See the following:
Why does this happen? If we can make it work, then we should have solved the allocation problem.