Closed GoogleCodeExporter closed 9 years ago
Fast investigation to the problem shows following:
If I disable aborting asio thread and dont cleanup resources - memory footprint
seems to be somethat stable during runs (share 4000 torrents, remove, share,
remove and so on).
I mean, if I comment lines 2308-2313 in src/torrent.cpp:
2308 »···»···if (m_owning_storage.get() and 0)
2309 »···»···{
2310 m_storage->abort_disk_io();
2311 »···»···»···m_storage->async_release_files(
2312 »···»···»···»···boost::bind(&torrent::on_torrent_aborted,
shared_from_this(), _1, _2));
2313 »···»···}
... everything seems to work much better. I have 258mb of resident memory after
10*4000 torrents shared totally and removed. Somethat better than 12gb :). I
believe io cleanup still must be done, but probably in some different way.
Original comment by mocksoul
on 15 Nov 2010 at 2:38
I did temporaly solved this for myself (coz, I need service working :)) by
creating proxy method join() in storage (which calls m_io_thread.join()) and
calling that at torrent::abort
There still seems to be some memory leaks, but usable with half-a-day restarts.
Please make some invistigation to the problem! I guess you can find bug much
faster than me :).
Original comment by mocksoul
on 15 Nov 2010 at 3:43
Thanks for the report, I'll try to look into this as soon as possible.
Original comment by arvid.no...@gmail.com
on 15 Nov 2010 at 5:30
Ehh.. nope, my first version didnt worked correctly. Done this weird hack
instead =)
Event Resident memory
Initial startup: ~30mb
share 4000 torrents ~90mb
share 4000*10 torrents ~230mb
share 4000*100 torrents crash
For the last test everything went "fine" for 18*4000 torrents, at that moment
eating "just" 500mb of resident memory. But suddenly - ram consumption goes
hurry and it was ~3gb for 19*4000 torrents set. And soon - SIGSEGV :).
By the way: I tried both variants -- with boost pool allocation and without.
Same behaviour.
Original comment by mocksoul
on 15 Nov 2010 at 5:31
Attachments:
I'm afraid you might have problems with that patch. there's only one disk I/O
thread and when a torrent is removed, it cancels all of its requests. With your
patch you're counting all disk I/O jobs, from all torrents.
Original comment by arvid.no...@gmail.com
on 15 Nov 2010 at 5:39
have you tried any leak detection tool, or heap profiling tool? If you build
with debug symbols enabled and tcmalloc (part of goolge performance tools) you
can generate a nice heap profile which would probably point exactly at which
buffers are leaking. given that it grows this fast, it sounds like disk buffers
(they're 16kB each).
Do you have a lot of download/upload activity for this to happen, or do you
think I could reproduce it with entirely inactive torrents?
Original comment by arvid.no...@gmail.com
on 15 Nov 2010 at 5:41
I'm reproducing it with entrirely inactive torrents. Just add to session bucnh
of them and remove as fast as possible. At some moment each new torrent will
add 1-2mb to heap after adding and it blows :)
>> have you tried any leak detection tool, or heap profiling tool
not. not yet :)
I'll try google performance tools in a next few hours.
Original comment by mocksoul
on 15 Nov 2010 at 5:49
Ehh.. it seems no leaking on my linux gentoo x86 machine at all. Problems I
posted above were on FreeBSD 7.1 x86_64.
Do it makes any sense -- setup quickly 64bit linux box and test with tcmalloc
on it? Were is no way to run tcmalloc on freebsd, yet.
Original comment by mocksoul
on 15 Nov 2010 at 6:56
[deleted comment]
Does this make any sense?
Leak of 2014496 bytes in 472 objects allocated from:
@ b6b77d41 libtorrent::tracker_manager::queue_request
@ b6b1e96d libtorrent::torrent::announce_with_tracker
@ b6b35638 libtorrent::torrent::stop_announcing
@ b6b241aa libtorrent::torrent::abort
@ b6acb347 libtorrent::aux::session_impl::remove_torrent
@ b6abb4d1 libtorrent::session::remove_torrent
@ b6f8544a boost::python::objects::caller_py_function_impl::operator
@ b66401d4 boost::python::objects::function::call
Leak of 4260256 bytes in 1001 objects allocated from:
@ b6ce433e libtorrent::tracker_manager::queue_request
@ b6cadc74 libtorrent::torrent::announce_with_tracker
@ b6cae2b7 libtorrent::torrent::stop_announcing
@ b6cb337f libtorrent::torrent::abort
@ b6c7d44b libtorrent::aux::session_impl::remove_torrent
@ b6c79506 libtorrent::session::remove_torrent
@ b6f6deca boost::python::objects::caller_py_function_impl::operator
@ b6aa91d4 boost::python::objects::function::call
Original comment by mocksoul
on 15 Nov 2010 at 7:26
As far as I see this does not visible on Linux 32/64 bit. Checking FreeBSD7
32bit right now..
Original comment by mocksoul
on 16 Nov 2010 at 7:58
FreeBSD7 x86 is also OK.
The only one "ouch" system is FreeBSD7 AMD64. And we have >13000 such servers
here.. uhhh..
Sadly right now I have no idea how to debug heap mem allocation in freebsd 64
:). Any thoughts?
google-perf-tools not working
electricfence not working
valgrind (obviously) not working
Original comment by mocksoul
on 16 Nov 2010 at 10:17
you could try "man malloc" and see if it mentions MallocStackLogging (an
environment variable). If it supports this, you can set this env variable and
then inspect the program with "leaks"
Original comment by arvid.no...@gmail.com
on 17 Nov 2010 at 12:15
Well.
You have shared_ptr piece_checker->torrent and intrusive_ptr
torrent->piece_checker.
And probably they are get removed in a wrong time.
See. If I remove "m_owning_storage = 0" in torrent::abort -- it seems no memory
leak anymore (just little -- 4000*100 torrents shared = 250mb res mem).
If I put m_owning_storage = 0 BEFORE this snipped of code:
if (m_owning_storage.get())
2310 »···»···{
2311 »···»···»···m_storage->abort_disk_io();
2312 »···»···»···m_storage->async_release_files(
2313 »···»···»···»···boost::bind(&torrent::on_torrent_aborted,
shared_from_this(), _1, _2));
2314 »···»···}
... again -- everything works fine.
any thoughts?
Original comment by mocksoul
on 17 Nov 2010 at 4:04
"m_owning_storage = 0" before abort_disk_io() and async_release_files() - 50
runs with 1000 torrents (add to session, remove soon) -- 140mb resident memory
(abort_disk_io() will never be called) (this obviously will get me into
troubles soon =))
remove "m_owning_storage = 0" from torrent::abort completely - 50 runs with
1000 torrents -- 430mb resident memory :(. abort_disk_io() gets called every
time.
interesting one: leave "m_owning_storage = 0" as is, but add sleep for 0.1 sec
at the end of torrent::abort -- 50 runs with 1000 torrents -- 188mb res mem.
Obviousky some memory still leaked, but not as much as if removing torrents
without any timouts.
Original comment by mocksoul
on 17 Nov 2010 at 6:20
Checked object destruction. Everything works fine -- Torrent gets destructed
immidiately after PieceChecker destruction, which is destructed always soon
after remove_torrent() call.
since I have only 1 failure platform -- will try various versions of gcc and
toolchain (was using gcc4.4.4)
Original comment by mocksoul
on 17 Nov 2010 at 5:55
I've been looking at the tracker announce objects leaking (from your first leak
dump). From your experimentation however, it seems like that's not what's
leaking. It's looks like it's the storage object.
There is a long comment where m_owning_storage is declared, explaining what the
intention of those two pointers.
Thanks for your debugging efforts! I've been quite busy these last two days,
I've only had time to look at it briefly still.
I wonder if it might be an issue with the atomic operations on the reference
counter. Just as an experiment you could try building with this define:
BOOST_AC_USE_PTHREADS
That will use mutexes instead of atomic operations, which is much more likely
to work an arbitrary platforms, but is also slower. If this would fix the
problem, at least we would know that it's very likely to be a configuration
issue or bug with the specific version of GCC related to the atomic operations.
Original comment by arvid.no...@gmail.com
on 17 Nov 2010 at 7:06
I was looking reference counter in m_owning_storage. It was always == 3 in
torrent::abort. That's properly fine as far as I can see.
After a lot of experiments I can tell you what is going on in little more
detail:
0) start daemon
1) share 1000 torrents
2) remove them
3) share again (after few seconds)
4) remove again
5) share again (after few seconds)
6) remove. And here we get some oops -- memory usage jumps +20-30mb (at this
moment it is usually jump from 60-70mb to 100mb in my setup)
7) right now everything looks not awful.. but.. if I try to add 1000 torrents
now - each addition will eat a LOT of ram finilizing in around ~600mb.
8) remove these 1000 torrents -- again memory was eaten (~800mb)
9) all next additions/removal grows memory in uncontrolled manner. Also it is
never becomes smaller. Only grow.
I have lock around remove_torrent call, so they are not maded in parallel. I'm
doing that from python bindings, althouth this should not be a problem at all.
The more torrents I add/remove in a time -- the more chance of faster memory
grow occur. With 1000 torrents it will 100% leak mem (every run - absolutely
the same) after 3 run.
I'll try with BOOST_AC_USE_PTHREADS define next few minutes..
Original comment by mocksoul
on 17 Nov 2010 at 7:29
Compiling with TORRENT_DEBUG + BOOST_AC_USE_PTHREADS does not allows sharing
torrent (dies with SIGSEGV -- 1) got invalid info hash after hashing --
0000000000000000ab61b4e9c2cc4f5cacf5d321, 2) lt.torrent_info creates object
which cant provide metadata() failing with RuntimeError:
basic_string::_S_construct NULL not valid (and SIGSEGV here)). Quick look into
gdb backtrace shows probably some errors with strings:
(gdb) bt
#0 libtorrent::file_storage::rename_file (this=0x1f91343e8, index=Variable
"index" is not available.
) at basic_string.h:273
#1 0x0000000001f6af24 in ?? ()
#2 0x00000001f74bd500 in typeinfo for void ()() () from
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/bin/../lib/l
ibboost_python.so.1.45.0
#3 0x00000001f73af8b0 in
boost::python::converter::shared_ptr_deleter::~shared_ptr_deleter () from
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/bin/../lib/l
ibboost_python.so.1.45.0
#4 0x00007fffeed76b40 in ?? ()
#5 0x00000001f73af8b0 in
boost::python::converter::shared_ptr_deleter::~shared_ptr_deleter () from
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/bin/../lib/l
ibboost_python.so.1.45.0
So, sadly, not an option :(
Original comment by mocksoul
on 17 Nov 2010 at 7:37
Heh.. already not only me, but 8 (!) peoples workin on this problem together
with me :). One made a promise what he can provide working valgrind on freebsd
amd64.. should be during next hour. Hope, it will show there the memory leak
is. But personally, I think it is not just simple memory leak, but something
more strange. Because I did analyzed logs libtorrent provide about storage
allocation -- it seems to be ok (not huge enought, and not grows together with
mem).
Original comment by mocksoul
on 17 Nov 2010 at 7:50
actually, the BOOST_AC_USE_PTHREADS option will change the ABI of the boost
libraries. I bet you're building libtorrent with the makefiles, linking against
a pre-built boost library (which was built without the BOOST_AC_USE_PTHREADS
option). This will make them incompatible and die in very odd an subtle ways.
If you run out of other things to try, it might be worth rebuilding boost as
well. This is fairly simple if you have boost-build (bjam) installed and
workding. Just do "bjam boost=source" in the libtorrent directory (or the
python binding dir).
Original comment by arvid.no...@gmail.com
on 17 Nov 2010 at 9:26
I'm building boost from source too, using jam files. But libtorrent via
configure.
Actually I did rebuild boost dozen times already (tried 1.41, 1.44 and
1.45beta1), but didnt in that case :). I'll try.
Original comment by mocksoul
on 17 Nov 2010 at 9:29
Actually, running valgrind I'm get a lot of these errors from libtorrent/boost:
==00:00:05:08.316 92575== 1864 errors in context 229 of 897:
==00:00:05:08.316 92575== Thread 132:
==00:00:05:08.316 92575== Conditional jump or move depends on uninitialised
value(s)
==00:00:05:08.316 92575== at 0x248659: strlen (in
/usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==00:00:05:08.316 92575== by 0x435217C: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::basic_string(char const*,
std::allocator<char> const&) (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libstdc+
+.so.6)
==00:00:05:08.316 92575== by 0x3BE7762:
libtorrent::torrent::announce_with_tracker(libtorrent::tracker_request::event_t,
boost::asio::ip::address const&) (address_v4.ipp:90)
==00:00:05:08.316 92575== by 0x3BF2862:
libtorrent::torrent::start_announcing() (torrent.cpp:5256)
==00:00:05:08.316 92575== by 0x3BF2C9D:
libtorrent::torrent::files_checked(boost::unique_lock<boost::mutex> const&)
(torrent.cpp:4529)
==00:00:05:08.316 92575== by 0x3BF2FBC:
libtorrent::torrent::on_piece_checked(int, libtorrent::disk_io_job const&)
(torrent.cpp:1161)
==00:00:05:08.316 92575== by 0x3ADE2A1:
boost::asio::detail::completion_handler<boost::_bi::bind_t<boost::_bi::unspecifi
ed, boost::function<void ()(int, libtorrent::disk_io_job const&)>,
boost::_bi::list2<boost::_bi::value<int>,
boost::_bi::value<libtorrent::disk_io_job> > >
>::do_complete(boost::asio::detail::task_io_service*,
boost::asio::detail::task_io_service_operation*, boost::system::error_code,
unsigned long) (function_template.hpp:1013)
==00:00:05:08.316 92575== by 0x3AFD065:
boost::asio::detail::task_io_service::run(boost::system::error_code&)
(task_io_service_operation.hpp:35)
==00:00:05:08.316 92575== by 0x3B9B3ED:
libtorrent::aux::session_impl::operator()() (io_service.ipp:64)
==00:00:05:08.316 92575== by 0x404710E: thread_proxy (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_thread.so.1.45.0)
==00:00:05:08.316 92575== by 0x11DB4D0: ??? (in /lib/libthr.so.3)
and _nothing_ else. Only this. And this related to asio.. ehh :)
Original comment by mocksoul
on 17 Nov 2010 at 9:32
And also a lot of these:
==00:00:05:08.299 92575== Thread 132:
==00:00:05:08.299 92575== Conditional jump or move depends on uninitialised
value(s)
==00:00:05:08.299 92575== at 0x3AF7B17:
boost::asio::detail::kqueue_reactor::run(bool,
boost::asio::detail::op_queue<boost::asio::detail::task_io_service_operation>&)
(kqueue_reactor.ipp:294)
==00:00:05:08.299 92575== by 0x3AFCD69:
boost::asio::detail::task_io_service::run(boost::system::error_code&)
(task_io_service.ipp:264)
==00:00:05:08.299 92575== by 0x3B9B3ED:
libtorrent::aux::session_impl::operator()() (io_service.ipp:64)
==00:00:05:08.299 92575== by 0x404710E: thread_proxy (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_thread.so.1.45.0)
==00:00:05:08.299 92575== by 0x11DB4D0: ??? (in /lib/libthr.so.3)
they are not point of our interest?
Original comment by mocksoul
on 17 Nov 2010 at 9:34
I don't think these warnings are very likely to be pointing towards the memory
leak, although, it does seem a bit weird, I should investigate the first one at
least.
If you have encryption enabled, you would most likely get a bunch of these
inside libcrypt as well.
Original comment by arvid.no...@gmail.com
on 17 Nov 2010 at 9:44
BOOST_AC_USE_PTHREADS didnt changed bug behaviour.
Original comment by mocksoul
on 18 Nov 2010 at 1:03
Encryption is disabled in my setup. I only have DHT (because python binding not
buildable without DHT as of now ;))
Original comment by mocksoul
on 18 Nov 2010 at 1:04
The fun thing is -- under valgrind program performs much slower (5..10x times)
and... memory leak does not occur :))))).
Tomorrow I'll write pure C++ version with in-program cycle adding/removing
torrents. This should perform much faster than whole python under valgrind.
Also I'll be able to give exact "scenario" to you.
If even this will not help -- the only option I'll still have - leave torrents
I want to remove just in paused state. Obviously that will have memory
footprint, but not very huge.
Or.. what will happen if I'll not call abort_disk_io() and
async_release_files() during torrent removal?
Original comment by mocksoul
on 18 Nov 2010 at 2:25
==00:01:51:55.547 64225== LEAK SUMMARY:
==00:01:51:55.547 64225== definitely lost: 0 bytes in 0 blocks
==00:01:51:55.547 64225== indirectly lost: 0 bytes in 0 blocks
==00:01:51:55.547 64225== possibly lost: 119,838,182 bytes in 561,592
blocks
==00:01:51:55.547 64225== still reachable: 7,608,369 bytes in 89,068 blocks
==00:01:51:55.547 64225== suppressed: 24 bytes in 1 blocks
This is run under python (very very very slow). Each remove_torrent was approx
0.15sec -- even more than in my "with sleep" test in comment #15. Since
sleeping during remove_torrent also fixes a problem, I see nothing strange here.
And I found strange solution for whole problem :). During valgrind runs I need
to remove huge amount of warnings from valgrind output related to python. For
that I did removed pymalloc stuff (--without-pymalloc for python build
configuration). And now.. vere.. is.. NO!!!! leak!!!!!!!!! Drawback is memory
usage -- now it is ~100mb after program restart.
This is 100% true -- I tested few times, compiling all my apps with and without
pymalloc in python (sadly, C extensions not compatible for both versions of
python).
Digging into the problem a little I found no direct answers, but got another
news: pymalloc _never_ releases memory to OS. This can be awful for
long-running (such as my :)) processes. Got advice to run with tcmalloc (heh,
from google-perf-tools you did also noticied).
Results:
- pymalloc: ~1gb memory eaten after 3x1000 torrents shared (i.e. 3 runs by 1000 torrents). ~2.86sec each run
- malloc: 150mb memory eaten after 45x1000 torrents shared. Approx ~2.9sec each run (there are a lot of my python code + files get hashed every time). Cant wait to test tcmalloc :)
- tcmalloc: (I'm in shock!) 2.65sec each run. 45x1000 torrents shared. 136mb memory eaten. Well. And no memory leaks/crashes. Btw, I linked with tcmalloc not even python library, but also libtorrent and all my python C extensions I have (actually two -- pycrypto is the libtorrent's friend).
Dont worry (at least, yet :)) about small, but constant memory usage I'm
experience. This memory probably eaten by my custom libtorrent patches (e.g. we
need >1 IP for each peer), because I saw that in valgrind run. I did run
without that patch with tcmalloc and memory footprint stabilized at ~80mb for
100 runs (1000 torrents each). So, no memory leaks in libtorrent, wohoooo!
Original comment by mocksoul
on 18 Nov 2010 at 6:06
if BOOST_AC_USE_PTHREADS didn't make a difference, I have a feeling that it's
just a race condition on my part.
it's very helpful to know that it doesn't happen when you comment those lines
out. I will look specifically for race conditions under the assumption that
it's the storage that's leaking.
abort_disk_io() does the following thing:
cancels most outstanding disk jobs (waiting to be executed) for this torrent,
this includes outstanding reads and writes to the disk. It also goes through
all pieces in the disk read cache belonging to this torrent and clears them.
None of these things are critical. The disk cache buffers will be reclaimed
later anyway, when other torrents wants more buffers. The outstanding disk jobs
will also complete eventually anyway.
async_release_files() will flush all pending write cache blocks for the torrent
(and free them from the cache) as well as closing all open file descriptors
associated with the torrent. This is also not strictly speaking necessary,
except possibly for the flushing of the write cache. However, if you only
remove torrents that are completely downloaded, all pieces will be flushed
already, so this wouldn't be a problem.
Original comment by arvid.no...@gmail.com
on 18 Nov 2010 at 6:22
ok, should I stop looking?
out of curiosity, what are you building?
Original comment by arvid.no...@gmail.com
on 18 Nov 2010 at 6:29
[deleted comment]
[deleted comment]
[deleted comment]
Since I spent 5 days digging into this problem -- nobody else should :). So, I
want to correctly understand what is going on and save that info for other
peoples and u )
Where is no guarantee what this specific task (bulk removal) is the only piece
which makes whole python+libtorrent goes crazy. I cant even strictly say what
bulk adding does not have same effect (because after process goes crazy both
add/remove eat a lot of memory).
Right now I want:
1) check different python versions, 2.6.7 and 2.7.0 specially. Currently I'm using Python 2.6.6. Didnt yet readed full commit history for that python branches, but probably where are some work on pymalloc was done. And maybe this problem occurs not only with libtorrent. And maybe this will be fixed :)
2) check different allocators. tcmalloc is not leaking as far as I can see, but it is not very suitable -- because it maded for speed, not for low memory usage. And memory usage usually bigger than different variants. Different variants are: freebsd'd standart libc malloc and ptmalloc2 and (maybe, if I still will not be satisfied) - ptmalloc3. Also I'll try to use ptmalloc2 and tcmalloc WITH pymalloc together. Probably where are some specific bug in freebsd 7's libc malloc :). Small test already prooved that ptmalloc gives the tinniest python process -- <4mb for interpretator itself without business logic. This is exact what I want (smallest memory footprint).
3) recheck stressfully again linux x86 / x86_64 and freebsd x86 without any tweak applied (i.e. with standart pymalloc). I want stable memory consumption for huge amount of torrents being added and removed (say, 1.000.000)
4) make a speed test of libtorrent standart functionality against different allocators. 10% speedup I saw while adding torrents is a huge difference, isnt it?
5) ask you to add all from comment #30 except first two lines as comments to abort_disk_io() and async_release_files() functions :)
right now stop digging, dont waste your time at this moment, at least until
I'll provide more info.
Actually, if different allocator does not leak that means that there is no
forgotten free() call, doesnt it? So, no leak, but something different.
> out of curiosity, what are you building?
Facebook uses torrent (pure python implementation, which I can call "awful!")
Twitter uses torrent (again, pure python implementation)
And we are too. But we are using libtorrent not only for updating code on
servers, we use it for every transfer between servers. And we have transfers
between 10 bytes and many hundred gigabytes :). We main idea is to make
conviniet cp-like program with torrent transport behind. And it already works!
:) Sometimes good, sometimes... it is easy to kill 10gbbs network for a while :)
And probably I'll make some additions which libtorrent (you, I mean) will be
also interested too. E.g. after making it stable I'll need libtorrent
automatically detect mtime change by "other process" on files which are being
downloading and seeding and pop file_error alert in that case, pausing torrent
immidiately. Also python bindings are not complete and sometimes little
confusing (dancing with big_number :)).
Original comment by mocksoul
on 18 Nov 2010 at 7:56
Python 2.7 has issue.
2.6.7 - not exist, 2.6.6 is the lastest, my mistake.
ptmalloc2 not working - sporadic segfaults
--without-pymalloc is obviously needed for python building, just linking with
tcmalloc does not helps. So, python ABI will change :(
more soon =) thnx for your help!
Original comment by mocksoul
on 18 Nov 2010 at 1:55
Hi there.
News are not good enought :) :
1. Linux systems (at least 64bit) also involved in this issue.
2. Mixing different allocators does not helps 100% (actually, it should not)
Even if I'm using tcmalloc with boost & libtorrent, and --without-pymalloc
(i.e. libc malloc) with python -- libtorrent starts eats memory. Not very fast,
but ~1-2mb for each 1000 torrents being added/removed. On linux, sadly, it
starts eating a lot even with tcmalloc.
Also I cant use tcmalloc because of fork-from-threaded-environment problems
(you can google for that).
Thus, I tried "last resort" -- comment code block in torrent::abort (see
comment #14) completely. And.. it works totally perfect, comparing with my
other different tries. 15 x 500 torrents added/removed == 70 (!) resident
memory on Linux.
So, my conclusion will upset you a little, probably. This problem is in
libtorrent or boost code. And obviously it is related to either
m_storage->abort_disk_io() or m_storage->async_release_files(). If I comment
only one of them -- trouble is still there. I think, this is because I still
need to leave m_owning_storage.get() call in that case (the working case
includes commenting of that).
As of now I'm really need your advice. Will I have any troubles if I comment
them? Specially what will happen if I'll remove torrent which was not
downloaded completely yet (e.g. some blocks not flushed to disk)?
And to help you see this by yourself -- I can make small python script, which
will reproduce problem at least on 64 bit linux and freebsd. Probably, 32bit
also, dont know yet, coz I have only 64bit machines here, except my own
notebook =). Should I?
Original comment by mocksoul
on 22 Nov 2010 at 3:57
A test that reproduces it would definitely be helpful.
If you can reproduce it with tcmalloc, could you run it with heap profiling
turned on and generate a .ps of the memory allocations after the memory leak is
obvious?
That would narrow down exactly which object is leaking.
thanks a lot for debugging this!
Original comment by arvid.no...@gmail.com
on 22 Nov 2010 at 6:21
Take a look:
Program heap usage (9 times add/remove 500 torrents):
MB
52.37^ #
| @: @#::::
| @@:@::::@#::::
| @@ @:@@:@::::@#::::
| :@ :::@:@@:@::::@#::::
| @::: :@ :::@:@@:@::::@#::::
| ::: @::::::@ :::@:@@:@::::@#::::
| @:::: ::@:::: :@ :::@:@@:@::::@#::::
| @::@:::@:::::: @:::: :@ :::@:@@:@::::@#::::
| :@: ::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| @@:@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| @:::: :@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| ::::::::::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| ::::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :: ::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :: ::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :: ::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :: ::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
| :: ::: ::: ::@::::::@ :@ @::@::@: :@:::::: @:::: :@ :::@:@@:@::::@#::::
0 +----------------------------------------------------------------------->Gi
0 50.22
So, its increases over time. Most come from:
94.09% (51,669,155B) (heap allocation functions) malloc/new/new[], --alloc-fns,
etc.
->15.70% (8,621,208B) 0x345B87B:
libtorrent::aux::session_impl::add_torrent(libtorrent::add_torrent_params
const&, boost::system::error_code&) (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libtorre
nt-rasterbar.so.6)
| ->15.70% (8,621,208B) 0x3452BED:
libtorrent::session::add_torrent(libtorrent::add_torrent_params const&) (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libtorre
nt-rasterbar.so.6)
| ->15.70% (8,621,208B) 0x3159BCB: (anonymous
namespace)::add_torrent(libtorrent::session&, boost::python::dict)
(session.cpp:115)
| ->15.70% (8,621,208B) 0x3167EA4:
boost::python::objects::caller_py_function_impl<boost::python::detail::caller<li
btorrent::torrent_handle (*)(libtorrent::session&, boost::python::dict),
boost::python::default_call_policies,
boost::mpl::vector3<libtorrent::torrent_handle, libtorrent::session&,
boost::python::dict> > >::operator()(_object*, _object*) (invoke.hpp:75)
| ->15.70% (8,621,208B) 0x3A0B907:
boost::python::objects::function::call(_object*, _object*) const (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_python.so.1.45.0)
| ->15.70% (8,621,208B) 0x3A0BAE6:
boost::detail::function::void_function_ref_invoker0<boost::python::objects::(ano
nymous namespace)::bind_return,
void>::invoke(boost::detail::function::function_buffer&) (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_python.so.1.45.0)
| ->15.70% (8,621,208B) 0x3A13B33:
boost::python::handle_exception_impl(boost::function0<void>) (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_python.so.1.45.0)
| ->15.70% (8,621,208B) 0x3A09306: function_call (in
/place/home/mocksoul/sky/Berkanavt/supervisor/skynet-initial/python/lib/libboost
_python.so.1.45.0)
| ->15.70% (8,621,208B) 0xD45A21: PyObject_Call (abstract.c:2492)
| ->15.70% (8,621,208B) 0xDE6208: PyEval_EvalFrameEx
(ceval.c:3968)
| ->15.70% (8,621,208B) 0xDE8386: PyEval_EvalCodeEx
(ceval.c:3000)
| ->15.70% (8,621,208B) 0xDE6485: PyEval_EvalFrameEx
(ceval.c:3846)
| ->15.67% (8,604,000B) 0xDE8386: PyEval_EvalCodeEx
(ceval.c:3000)
| | ->15.67% (8,604,000B) 0xD71D7B: function_call
(funcobject.c:524)
| | ->15.67% (8,604,000B) 0xD45A21: PyObject_Call
(abstract.c:2492)
| | ->15.67% (8,604,000B) 0xDE5100:
PyEval_EvalFrameEx (ceval.c:4063)
| | ->15.67% (8,604,000B) 0xDE7A42:
PyEval_EvalFrameEx (ceval.c:3836)
| | ->15.67% (8,604,000B) 0xDE7A42:
PyEval_EvalFrameEx (ceval.c:3836)
| | ->15.67% (8,604,000B) 0xDE8386:
PyEval_EvalCodeEx (ceval.c:3000)
| | ->15.67% (8,604,000B) 0xD71C7E:
function_call (funcobject.c:524)
| | ->15.67% (8,604,000B) 0xD45A21:
PyObject_Call (abstract.c:2492)
| | ->15.67% (8,604,000B) 0xD563BD:
instancemethod_call (classobject.c:2579)
| | ->15.67% (8,604,000B) 0xD45A21:
PyObject_Call (abstract.c:2492)
| | ->15.67% (8,604,000B) 0xDE0871:
PyEval_CallObjectWithKeywords (ceval.c:3619)
| | ->15.67% (8,604,000B) 0xE16488:
t_bootstrap (threadmodule.c:428)
| | ->15.67% (8,604,000B)
0x11DB4CF: ??? (in /lib/libthr.so.3)
| |
| ->00.03% (17,208B) in 1+ places, all below ms_print's
threshold (10.00%)
Another point: finally I made it stable and not eating memory! What I need is
just commenting out stop_announcing() call in torrent::abort. This idea come
from because there was no leak if I pause torrents first before remove. Little
more research shows what just turning off canceling of m_tracker_timer and
m_dht_announce_timer do the trick: no huge leak after that. But memory is still
growing slowly (5000x30 == 400mb). Thus, I was playing with valgrind massif.
Result of that you can see above.
Original comment by mocksoul
on 22 Nov 2010 at 11:54
This is sort of a shot from the hip (i.e. wild guess):
Index: src/torrent.cpp
===================================================================
--- src/torrent.cpp (revision 5024)
+++ src/torrent.cpp (working copy)
@@ -2284,14 +2284,12 @@
if (m_abort) return;
- m_abort = true;
// if the torrent is paused, it doesn't need
// to announce with even=stopped again.
- if (!is_paused())
- {
- stop_announcing();
- }
+ stop_announcing();
+ m_abort = true;
+
#if defined TORRENT_VERBOSE_LOGGING || defined TORRENT_ERROR_LOGGING
for (peer_iterator i = m_connections.begin();
i != m_connections.end(); ++i)
The test if the torrent is paused is not necessary. If it's paused
stop_announcing() should return immediately anyway. However, maybe m_abort =
true makes a difference. Maybe setting m_abort to true after calling
stop_announcing() makes it work just as if you paused it before aborting.
could you try this?
Original comment by arvid.no...@gmail.com
on 23 Nov 2010 at 4:44
I guess, problem with huge leak in stop_announcing() occurs only if there are
some tracker requests which should be canceled. Thats why this leak only
visible if you add/remove a LOT of torrents -- and if you will add/remove one
by one -- no leak occurs.
>> could you try this?
I did. Still leaks.
Original comment by mocksoul
on 23 Nov 2010 at 5:15
hm.. ok. but calling pause() right before removing the torrent makes it not
leak?
Original comment by arvid.no...@gmail.com
on 23 Nov 2010 at 6:30
I tried pausing in python code before removing torrent. If I wait untill
torrent really paused (via paused_alert) -- yes, there is no leak.
Original comment by mocksoul
on 23 Nov 2010 at 6:56
I checked refcount for torrent shared pointer at the end of
session_impl::remove_torrent. It is always equals 4. Is that normal?
Original comment by mocksoul
on 23 Nov 2010 at 7:05
After disabling stop_announce I have memory leak for about 30..40mb for each
5000 torrents being added or removed. Since, valgrind-massif points what most
of memory allocated during add_torrent call -- I guess some object still has
reference (maybe, torrent itself) after remove_torrent() call. Althought, it is
hard to analyze -- maybe some object from some async call is not being removed,
dunno.
Anyway, I'm not C++ expert and also dont know internals of libtorrent much :).
So, I'll disable stop_announcing here for now and will try to make test
program, which will reproduce problem. I'll try in python first, and it could
be translated to C++ variant if you will need so.
Original comment by mocksoul
on 23 Nov 2010 at 7:15
Also noticable this or not -- switch tracker from udp to http and vise versa
does not make any difference at all.
Original comment by mocksoul
on 23 Nov 2010 at 7:44
I think I know what's going on. (famous last words :P )
I think that what you're seeing is in fact not a memory leak, it's caused by
the torrent object being kept alive while announcing to the trackers that we
just stopped the torrent. It's quite common for trackers to not respond, or to
take a long time to respond, especially if you grab random torrents from the
wild.
For the majority of torrents you remove, it will probably stay alive for about
20-40 seconds, waiting for the tracker to time out. If my theory is correct,
you should see the memory usage go down to reasonable levels again by simply
waiting long enough after it has ballooned. Long enough is probably about a
minute or so.
The timeout of trackers is controlled by session_settings::stop_tracker_timeout
(which defaults to 5 seconds), so if you only have a single tracker, it
shouldn't take more than about 5 seconds. If you have multiple trackers, they
will probably all be tried one at a time, each having to time out in serial.
At least the latest posts you have made seem to suggest that this is in fact
the case, I'm not sure it would suggest the steep memory increase you saw on
add, once in this state.
Now, there's not really any good reason to keep the torrent object alive while
announcing to the tracker when stopping a torrent, I will take a look at
optimizing that.
Original comment by arvid.no...@gmail.com
on 23 Nov 2010 at 10:10
looking a bit closer it seems like I've already thought about this, so it seems
less likely that this would actually be the problem. I'll dig a little bit
deeper down this path though. I'll try to figure out which objects are holder
references to the torrent objects.
Original comment by arvid.no...@gmail.com
on 23 Nov 2010 at 10:22
Ok, I feel I need explain little more that I'm doing here.
1. Create 5000 unique files 16kb each (dd if=/dev/urandom...)
2. Make torrent for each of these files (using libtorrent sha1 hash)
3. Add those torrents to session pointing to real files, so they will be seeding
4. Wait untill all of them will be in seeding state
5. Remove files and soon (in 1-2 secons) remove torrents from session
6. Repeat from step1
I have tracker in machine near the test one -- used opentracker binary without
modifications. Tracker set in torrents in UDP mode, although I noticed above,
that, switching to HTTP does not make any sense.
Thats all. Also I want to notice that I see 2 separate memory leaks (or not
"leaks") here:
1. If I try to do steps above without modifing libtorrent -- on 2nd or 3rd try
libtorrent (or python, or boost-python, not sure yet) starts eating memory. 1
new torrent = 2-5mb (each!). Whole process works, but a lot (!) slower. Soon it
will eat 10-12 gb. And if I will not stop -- soon kernel will kill process
("out of swap space").
2. If I try to do steps above without stop_announcing() -- each new run eats
+30..40mb. So, memory footprint looks like this:
1) 30mb at start
2) 120mb after 1st run
3) 145mb after 2nd run
4) 190mb after 3rd run
5) and so on.. 5000 torrents add always the same amount of memory. And I'm not
noticing any amount of memory being freed for operating system ever.
Torrent adding process can be split into subprocesses:
1) generate sha1 hash -- no visible memory increase
2) add to session -- +30..40mb
3) announing to tracker -- no visible memory increase
4) remove from session -- probably +5mb during removal, and -5mb at the end
If this is torrent objects not being removed -- some calculations show that
30mb for 5000 torrents is abount 6kb for each. Can be true?
Also I'm not sure 100% yet is that memory consumed by libtorrent, by
boost-python, by python or even by my own python logic. At least I tried to
profile memory in python bytecode -- there seems to be no leak.. but, not sure
100% for that.
So, the only step which could be done -- reproduce problem with less amount of
code. I'll try that right now in python, and if we will not understood thats
going on -- we could make the same in pure C++ linking libtorrent (so, not
using python and bindings at all).
Original comment by mocksoul
on 23 Nov 2010 at 10:57
Ehh... I'm definitely will get drunk when we will fix this.. ;)))
Original comment by mocksoul
on 23 Nov 2010 at 11:09
Original issue reported on code.google.com by
mocksoul
on 15 Nov 2010 at 2:33