Closed krynju closed 4 months ago
@krynju I think this may be an issue of the incremental GC not being able to keep up. I was trying to recreate this to see if I could fix it, but what I ended up trying was in the background having it do a full GC every few seconds. With that I don't see the memory to continue to grow, even after an hour. Before the GC.gc()
change, the memory would grow by about 150MB/minute.
Here's my diff:
4c4
< using OpenTelemetryExporterPrometheus
---
> using OpenTelemetryExporterPrometheus
7a8,9
> GC.enable_logging(true)
>
172a175,182
> function gc_full_occasionally()
> while true
> sleep(2)
> GC.gc()
> end
>
> end
>
198c208,217
<
---
> @async begin
> try
> gc_full_occasionally()
> catch ex
> @error(
> "exception initializing",
> exception = (ex, catch_backtrace())
> )
> end
> end
Notes from today:
Some experiments with heap snapshots.
After the 4th snapshot I ran GC manually and:
I'm starting to think this is some hidden Julia issue. I would suspect the process memory footprint to go down along with the cleanup of the heap
Heap dump says the heap is only 150mb, but process info says it's 1.5gb
I ran the reproducer on 1.10 on a custom branch of OpenTelemetry.jl that has proper 1.10 support In two runs I could not reproduce the leak. Memory usage stays stable under load and GC seems to clean up most of it after I run it manually
This is potentially good news, but I'm not 100% sure of it yet. Will do more testing etc.
It's a julia 1.9 issue, closing this
EDIT: This is observable on 1.9, but not on 1.10
I've identified a memory leak in a long running service coming from OpenTelemetry.
I'm attaching an MWE split off from the main service, which can reproduce the issue identically The example attached is pretty extreme and can get you to ~2GB memory usage in about 10 minutes.
The leak is dependent on measure activity and capturing measures through the
/metrics
endpoint.metricsspam
emulates a running service, which generates some metric activityab
to call the/metrics
endpoint extensivelyVersion info: julia 1.9.3, linux OpenTelemetry versions as the attached Manifest.toml or simpler, this repo at commit https://github.com/oolong-dev/OpenTelemetry.jl/commit/4975ecddf521f4fa930d4bc8964c0b8a91301039
Reproducer: memleak.zip
Tar contents:
Steps:
julia --project=. -e "include(\"run.jl\");init();"
ab -c 500 -n 1000000 http://localhost:9967/
Example runtime (~4gb in 25min)