Open Sbozzolo opened 8 months ago
Something changed in Julia 1.10 (Julia is probably tracking more granual allocations), and
ProfileCanvas.jl
is now struggling producing views for allocation. To give a sense of the incrased number of allocations, profiling the same run on 1.9 produces an HTML file that is 600 kB, on 1.10 it is 18 MB.The increased number of allocations tracked manifested the limits of
view_allocs
: some runs became virtually impossible to plot (cost of plotting > hours). I saw this behvior with julia 1.9 too when working with large allocation trees. I can prune the allocation tree (I probably don't need the very fine leaves anyway), but I think it would be still good to see if there is low-haning fruit to improve the algorithm/performance forview_alloc
.EDIT: Most time (97%) is spent in
string
~I just realized from the above flamegraph that the bottleneck is our show
method.~
Reopened with additional information.
(Sorry again for the noise).
@pfitzseb, this may be due to us removing a Base.show
method for one of our heavily typed parameters.
Can we replace this string
call with type_depth_limit
? That might be generally better anyway.
@pfitzseb, this may be due to us removing a
Base.show
method for one of our heavily typed parameters.
My tests seem to indicate that removing Base.show
does make things worse (twice as slow), but it is not the root of the problem (our CI is still using ClimaCore 0.11.7
, that has the show
method, and is suffering from this).
Yeah, that seems reasonable. I'm not sure how type printing actually works, but the low-tech solution I had in mind is using a LimitIO (there's a package I think?) for this that ends up printing 50 chars at most. We're already using that in Debugger.jl and the VS Code extension in other places and historically it works decently well.
xref: https://github.com/JuliaPerf/PProf.jl/pull/86 (thanks to @Sbozzolo)
Something changed in Julia 1.10 (Julia is probably tracking more granual allocations), and
ProfileCanvas.jl
is now struggling producing views for allocation. To give a sense of the incrased number of allocations, profiling the same run on 1.9 produces an HTML file that is 600 kB, on 1.10 it is 18 MB.The increased number of allocations tracked manifested the limits of
view_allocs
: some runs became virtually impossible to plot (cost of plotting > hours). I saw this behvior with julia 1.9 too when working with large allocation trees. I can prune the allocation tree (I probably don't need the very fine leaves anyway), but I think it would be still good to see if there is low-haning fruit to improve the algorithm/performance forview_alloc
.Most time (97%) is spent in
string
I tracked this down to https://github.com/pfitzseb/ProfileCanvas.jl/blob/08619eb1f37464b7fe1af91ffc544d8a4ad238dd/src/ProfileCanvas.jl#L353
Indeed, I verified that
string(alloc.type)
can take up to 0.05 s on a very fast machine when the type is complex. With > 10^5-10^6 allocations, this can be very expensive.It seems that
string(alloc.type)
is primarily used to compared withc.func
, so I wonder if there's a way to avoid converting to string?