tokio-rs / tracing

Application level tracing for Rust.
https://tracing.rs
MIT License
5.33k stars 697 forks source link

memory leak?? #2839

Open yuyang-ok opened 8 months ago

yuyang-ok commented 8 months ago

Bug Report

Version

tracing = "0.1.40"

yuyang@yuyang-virtual-machine:~/data-fabric$ cargo tree | grep tracing 
│   │   │   └── tracing v0.1.40
│   │   │       ├── tracing-attributes v0.1.27 (proc-macro)
│   │   │       └── tracing-core v0.1.32
│   │   │   ├── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
    │   │   │   └── tracing v0.1.40 (*)
    │   │   ├── tracing v0.1.40 (*)
    │   │   └── tracing v0.1.40 (*)
    └── tracing v0.1.40 (*)
common-tracing v0.1.0 (/home/yuyang/data-fabric/src/common/tracing)
yuyang@yuyang-virtual-machine:~/data-fabric$ cargo tree | grep tracing 
│   │   │   └── tracing v0.1.40
│   │   │       ├── tracing-attributes v0.1.27 (proc-macro)
│   │   │       └── tracing-core v0.1.32
│   │   │   ├── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
    │   │   │   └── tracing v0.1.40 (*)
    │   │   ├── tracing v0.1.40 (*)
    │   │   └── tracing v0.1.40 (*)
    └── tracing v0.1.40 (*)
common-tracing v0.1.0 (/home/yuyang/data-fabric/src/common/tracing)
├── tracing v0.1.40 (*)
├── tracing-appender v0.2.3
│   └── tracing-subscriber v0.3.18
│       ├── tracing v0.1.40 (*)
│       ├── tracing-core v0.1.32 (*)
│       ├── tracing-log v0.2.0
│       │   └── tracing-core v0.1.32 (*)
│       └── tracing-serde v0.1.3
│           └── tracing-core v0.1.32 (*)
└── tracing-subscriber v0.3.18 (*)
│   │   │   └── tracing v0.1.40 (*)
│   │   └── tracing v0.1.40 (*)
├── common-tracing v0.1.0 (/home/yuyang/data-fabric/src/common/tracing) (*)
│   └── tracing v0.1.40 (*)
│   │   │   │   ├── tracing v0.1.40 (*)
│   │   │   │   │   │   ├── tracing v0.1.40 (*)
│   │   │   │   │   │   ├── tracing v0.1.40 (*)
│   │   │   │   ├── tracing v0.1.40 (*)
│   │   │       ├── tracing v0.1.40 (*)
│   │   ├── tracing v0.1.40 (*)
│   │   ├── common-tracing v0.1.0 (/home/yuyang/data-fabric/src/common/tracing) (*)
│   └── tracing v0.1.40 (*)
│   ├── tracing v0.1.40 (*)
│   └── tracing v0.1.40 (*)
└── tracing v0.1.40 (*)

Platform

x86_64-unknown-linux-gnu

Description

A memroy leak??

Enter your issue details below this comment.

One way to structure the description:

profile.pb.gz

you can use go tool to visualized.

go install github.com/google/pprof@latest
~/go/bin/pprof -http=:7777  heap.pb.gz
mladedav commented 6 months ago

Can you provide an example the code with a memory leak?

There is no memory leak in tracing as far as we know but there may be something. It is just impossible to find out if we don't know what we're looking for.

ikhin commented 5 months ago

same problem to me

$ cargo tree | grep tracing 
│   │       │   │   │   │   └── tracing v0.1.40
│   │       │   │   │   │       ├── tracing-attributes v0.1.27 (proc-macro)
│   │       │   │   │   │       └── tracing-core v0.1.32
│   │       │   │   │   └── tracing v0.1.40 (*)
│   │       │   │   ├── tracing v0.1.40 (*)
│   │       │   │   └── tracing v0.1.40 (*)
│   │       ├── tracing v0.1.40 (*)
│   │       └── tracing-futures v0.2.5
│   │           └── tracing v0.1.40 (*)
│   │   │   │   │       │       └── tracing v0.1.40 (*)
│   │   │   │   ├── tracing v0.1.40 (*)
│   │   │   │   ├── tracing-opentelemetry v0.21.0
│   │   │   │   │   ├── tracing v0.1.40 (*)
│   │   │   │   │   ├── tracing-core v0.1.32 (*)
│   │   │   │   │   ├── tracing-log v0.1.4
│   │   │   │   │   │   └── tracing-core v0.1.32 (*)
│   │   │   │   │   └── tracing-subscriber v0.3.17
│   │   │   │   │       ├── tracing v0.1.40 (*)
│   │   │   │   │       └── tracing-core v0.1.32 (*)
│   │   │   │   ├── tracing-subscriber v0.3.17 (*)
│   │   │   ├── tracing v0.1.40 (*)
│   │   │   └── tracing-opentelemetry v0.21.0 (*)
│   │   │   │   └── tracing v0.1.40 (*)
│   │   │   ├── tracing v0.1.40 (*)
│   │   │   ├── tracing-opentelemetry v0.21.0 (*)
│   │   │   └── tracing-subscriber v0.3.17 (*)
│   │   ├── tracing v0.1.40 (*)
│   │   └── tracing-opentelemetry v0.21.0 (*)
│   ├── tracing v0.1.40 (*)
│   ├── tracing-opentelemetry v0.21.0 (*)
│   │   │   ├── tracing v0.1.40 (*)
│   │   │   ├── tracing-opentelemetry v0.21.0 (*)

Focus on function reserve_for_push, I found

// tracing-opentelemetry-0.21.0/src/layer.rs
fn on_event(&self, event: &Event<'_>, ctx: Context<'_, S>) {
...
    if let Some(file) = file {
        otel_event
            .attributes
            .push(KeyValue::new("code.filepath", file));
    }
    if let Some(module) = module {
        otel_event
            .attributes
            .push(KeyValue::new("code.namespace", module));
    }
    if let Some(line) = meta.line() {
        otel_event
            .attributes
            .push(KeyValue::new("code.lineno", line as i64));
    }
...
    #[cfg(not(no_global_oom_handling))]
    #[inline]
    #[stable(feature = "rust1", since = "1.0.0")]
    pub fn push(&mut self, value: T) {
        // This will panic or abort if we would allocate > isize::MAX bytes
        // or if the length increment would overflow for zero-sized types.
        if self.len == self.buf.capacity() {
            self.buf.reserve_for_push(self.len);
        }
        unsafe {
            let end = self.as_mut_ptr().add(self.len);
            ptr::write(end, value);
            self.len += 1;
        }
    }

jeprof_memory_leak.pdf

mladedav commented 5 months ago

The function you found is from tracing-opentelemetry, not tracing itself.

But the way this works is the library collects events as they come and then sends them when the span is closed and then all the data should be deleted. So if you create a very long-lived span with a large amount of events, it could eat a lot of memory, but it should not be leaked.

Can you try to provide a minimal repro please?

Also, if the culprit for you is tracing-opentelemetry then this is a different issue as the original author did not use that crate based on their cargo tree output.

ikhin commented 5 months ago

@mladedav Thank you for your tip! Sorry, the program is very large and there is no minimum program case that can be reproduced. If it can be reproduced, then I may be able to locate the problem.

Every grpc request will have a hook like .instrument (span). I turned off all jeager in code, and there was no memory leak during the stress test! Now I am gradually investigating which long-life span is causing it.