Closed andrewbanchich closed 5 years ago
Hi there,
A quick skim of the valgrind output makes me think it is a user error. The runtime handle is not being properly dropped when the process exits.
If you believe this is a bug in tokio, could you open a new issue providing a minimal repro of the leak.
You could use tokio-signal to implement a handler that performs a clean shutdown.
@andrewbanchich did you figure it out ? I'm also in a situation where a Tokio application using streams and mpsc channels appears to be leaking memory. Valgrind output looks similar to yours, I have no idea what I'm doing wrong.
@kamek-pf Are you performing a clean shutdown? i.e. all tasks & the runtime need to be terminated before the process exits.
@kamek-pf No, I never figured it out. I refactored some stuff to look into some high CPU usage I was seeing and once that was solved, the noticeable memory leak was gone. I just don't know why.
Another possibility is that a future waker leaked somehow which holds on to the runtime.
edit: it's a weak ref, so it shouldn't actually...
@carllerche I'm not performing a clean shutdown. My use case is a stream processing service that's meant to stay up pretty much forever. Memory usage keeps increasing over time, about 1.5~2Mb every day according to the Kubernetes monitoring tool we're using.
It has a bunch of moving pieces, tungstenite for websockets (I also tried the websocket crate, same observations), both with and without openssl via native-tls, prost for protobuf deserialization and rust-rdkafka at the end of the pipeline. Obviously it's all async and Tokio drives everything.
I couldn't find anything in the issue trackers of these libraries, tried stripping every dependency one by one to no avail.
I'll try to investigate more tomorrow but I'm running out of ideas.
My use case is a stream processing service that's meant to stay up pretty much forever.
Interesting, that's what my program does too.
I haven't tested it running for more than an hour or so, so maybe the leak is still there but just too small to notice over an hour.
maybe the leak is still there but just too small to notice over an hour
Maybe, this is what the past three days look like for me (since the last deployment):
The tricky part is, it takes several hours of uptime before I can get meaningful results
@kamek-pf Did you ever find the root cause? I'm in a similar case myself.
Nope. This projects runs on tokio 0.1 though, I'm waiting for the ecosystem to catch up and I'll update to tokio 0.2 and async/await. We'll see if the problem persists.
Are you experiencing this with tokio 0.2 ?
Yes, currently experiencing with tokio 0.2.6. I've been probing the application for the past week trying to find a potential memory leak in my code without luck. Our use of tokio is quite light so I'm going to replace our use of TCP reads and writes with a custom epoll implementation, hopefully that works :crossed_fingers:.
Well, that's disappointing :( I guess you could also try a different runtime if you don't rely on Tokio much. async-std could be a good candidate
If the leak has persisted across the versions it is highly unlikely the bug is in Tokio as the implementations have changed drastically since the original report. We are also not aware of any leak.
If you think it is a leak in tokio, we would need a repro of some sort. I would also suggest running a valgrind report after doing a clean shutdown. If a clean shutdown is not performed, valgrind will report all running tasks as leaks.
I'm not detecting any memory leaks after cleaning up all the resources on exit. My main suspect is user error (I'm doing something wrong) however after reading through the documentation and implementations, I can't narrow it down. It seems like there's a vector somewhere that is growing indefinitely or somehow the pinned futures are sticking around (nothing custom, just using the futures library). Some combination of something is causing me problems. Maybe later I'll find the time to really narrow it down, for now, I just need to fix the problem.
I'll share if my custom epoll solution fixes the problem. If it doesn't, I'll just keep replacing components until I hopefully find out where it is. Memory profilers haven't been much help. Time to brute force.
If you know it is a vector, you should be able to track down the cause. Can you get the backtrace where the vector is allocated?
Unfortunately, I haven't been able to pin it down. I've tried to hunt it down with a few different tools.
With multiple layers of abstraction, my stuff -> Lines -> AsyncReader -> Tokio TcpStream -> mio things get harder to find. That's one possible stack trace that could be responsible. I've read through the code and there isn't anything obvious. It's still likely to be my code, however I log the size of every dynamically sized data structure and it doesn't correlate. I also don't see any circular references but they could be hidden somewhere behind all the async await usage.
Blaming a vector given what I know is more of a gut feeling right now.
I have same issue with constantly growing memory usage on new connections (despite closing current connections) with async_tungstenite
and tokio
runtime.
For now I found that using tokio::time::interval
is causing that behavior.
When using async_std
runtime with their task::sleep
, memory usage constantly stays at same (max used connections) level.
I tried running future in current task, spawning new task, calling drop on interval - always same result.
tokio 2.10
@krzysztofgal have you noticed similar behavior with other time functions, like timeout ?
I'm also using a few intervals, and the problem didn't go away after porting the code base to tokio 0.2.
@carllerche are you aware of anything in the interval implementation that might cause this ? Also, semi-related, I'm using a bunch of unbounded channels, how does the internal buffer work ? Is it supposed to shrink or can it only grow ?
I'll try valgrind and a clean shutdown again on the new implementation tomorrow.
Symptoms of memory leaks can roughly be attributed to one of 3 cases:
To identify an actual memory leak, one must run the application under valgrind and then perform a clean shutdown. If a clean shutdown is not performed, then valgrind will report "leaked" memory simply because the cleanup steps were not able to free the memory.
To identify leaked tasks is a bit harder right now as Tokio itself doesn't provide the necessary instrumentation yet. THe best strategy is to include some form of counter in your tasks before spawning and track spawns & drops.
To check if the allocator is holding memory, switch to jemalloc
and tune it to aggressively release memory back to the OS.
Version
0.1.18
Platform
Both:
Darwin hostname 18.5.0 Darwin Kernel Version 18.5.0: Mon Mar 11 20:40:32 PDT 2019; root:xnu-4903.251.3~3/RELEASE_X86_64 x86_64
and
Linux hostname 4.15.0-30-generic #32-Ubuntu SMP Thu Jul 26 17:42:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Description
I'm using Tokio for creating an ETL tool: https://gitlab.com/andrewbanchich/emmett
I noticed when I run the program that the memory used steadily increases maybe 200KB every seconds for as long as i keep it running.
I'm not sure if this is something with Tokio or if I'm just doing something wrong (I'm guessing it's the latter).
To reproduce, you can clone the repo,
cd
into it, and docargo run
from that directory. It needs to be in the project directory because I'm currently reading the test config files in theexample_configs
directory.Basic overview: each
Pipeline
has a block of severalInput
s,Filter
s, andOutput
s, each of which are implemented as Streams. Each one of these is a spawned task and they communicate through Tokio's mpsc channels, except forOutput
s which needcrossbeam-channel
.I tried removing the outputs and crossbeam entirely and I am still seeing the issue.
Here is the valgrind output: