xlgjjff / libtorrent

Automatically exported from code.google.com/p/libtorrent
Other
0 stars 0 forks source link

Removing a lot (1000...4000) torrent handles from session in a moment -- libtorrent starts eating RAM and crashes soon (with SIGSEGV) #124

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

1. Share 2000 torrents (in my case after that resident mem size is about 130mb)
2. Stop all of them in a loop (quickly as possible) -- during that process 
libtorrent will eat ~1gb of memory and then stops. But if you try to add some 
other torrents - it will eat more and more CPU, untill killed by kernel with 
SIGSEGV (FreeBSD 7) (in my case -- after eating 12gb RAM :)). And it is still 
accepts connections/new torrents, but fairly slowly.

Tried 0.15.4 + boost 1.44.0 / 1.45.0_beta1, latest from 0_15_RC + boost 1.44.0. 
Behaviour is absolutely the same.

Original issue reported on code.google.com by mocksoul on 15 Nov 2010 at 2:33

GoogleCodeExporter commented 8 years ago
Ok, I did reproduced problem with small script. And problem is with the think 
you said about in #47 =).

I'm set active_downloads/active_seeds/active_limit = -1 for all tests

1) Add/del 5000 torrents 5 times without (!) any trackers being set == 30mb 
memory
2) Add/del 5000 torrents 5 times with local tracker (udp://localhost..), 
commented out "stop_announcing()" in tracker::abort, 5 secs for tracker 
completion, stop and receive timeouts == 100mb memory (+20mb each run)
3) Add/del 5000 torrents 5 times with local tracker, 50 secs for tracker 
timeouts == 170mb memory. And here is the problem place -- after 50 seconds 
memory is not freed. I did wait 1 hour -- no result, memory is still held.
4) Add/del 5000 torrents 5 times with local tracker, 1 sec for tracker timeouts 
~= 60mb memory, it varies from run to run (40..80mb).

So, the faster tracker you have -- less memory you will use :).

And here is the script I use:

#!/usr/bin/env python

import libtorrent as lt
import sys
import tempfile
import shutil
import time
import os
import subprocess as sp

#TRACKER_LIST = [['udp://cmsearch01.<BIG_COMPANY>.ru:2399']]
TRACKER_LIST = [['udp://localhost:6969']]
#TRACKER_LIST = []

def memusage(s):
    proc = sp.Popen(['ps', 'auxf', '-U', 'mocksoul'], stdout=sp.PIPE)
    for line in proc.stdout.readlines()[1:]:
        items = line.split()
        items[10] = ' '.join(items[10:])
        items = items[:11]
        user, pid, pcpu, pmem, vsz, rss, tt, stat, started, time, command = items
        if int(pid) == os.getpid():
            print '%s: %s/%s mem, %s%% CPU' % (s, vsz, rss, pcpu)

def add_torrents(tmpdir, session, cnt):
    torrents = {}

    ps = []
    for i in range(cnt):
        #import time
        #time.sleep(0.001)
        testfile = 'file%d' % i
        #os.system('dd if=/dev/urandom of=%s/%s bs=16k count=1 >/dev/null 2>&1' % (tmpdir, testfile))
        fs = lt.file_storage()
        lt.add_files(fs, '%s/%s' % (tmpdir, testfile), 0)

        torrent = lt.create_torrent(fs)
        lt.set_piece_hashes(torrent, tmpdir)

        torrent_info_dict = torrent.generate()
        torrent_info_dict['announce-list'] = TRACKER_LIST

        torrent_info = lt.torrent_info(torrent_info_dict)

        add_params = {
            'save_path': tmpdir,
            'storage_mode': lt.storage_mode_t.storage_mode_sparse,
            'paused': False,
            'auto_managed': True,
            'ti': torrent_info,
        }

        ps.append(add_params)

    for add_params in ps:
        handle = session.add_torrent(add_params)
        torrents[str(add_params['ti'].info_hash())] = handle

    print 'yes', len(torrents)
    return torrents

def wait_torrents(torrents):
    while not all((t.is_seed() for t in torrents.values())):
        print 'wait', len([t for t in torrents.values() if not t.is_seed()])
        time.sleep(0.1)

def remove_torrents(torrents, session):
    for hdl in torrents.values():
        session.remove_torrent(hdl)

def read_alerts(session):
    alert = session.pop_alert()
    read = 0
    while alert:
        #print alert, alert.message()
        alert = session.pop_alert()
        read += 1
    print 'read %d alerts' % read

def main():
    memusage('start')

    try:
        #dir = tempfile.mkdtemp(prefix='tleaktmp')
        dir = '/tmp/tleaktmp6OhFYo'
        print 'Using temp directory: %s' % dir

        session = lt.session()
        session.set_alert_queue_size_limit(1024 * 1024)

        sts = lt.session_settings()
        sts.tracker_completion_timeout = 60
        sts.tracker_receive_timeout = 60
        sts.stop_tracker_timeout = 60
        sts.active_downloads = -1
        sts.active_seeds = -1
        sts.active_limit = -1
        sts.udp_tracker_token_expiry = 120
        sts.min_announce_interval = 1
        sts.inactivity_timeout = 60
        sts.connection_speed = 10
        sts.allow_multiple_connections_per_ip = True
        sts.max_out_request_queue = 128
        sts.request_queue_size = 3

        sts.use_read_cache = False
        session.set_settings(sts)

        #session.set_alert_mask(lt.alert.category_t.tracker_notification | lt.alert.category_t.status_notification)
        #session.set_alert_mask(lt.alert.category_t.status_notification)

        session.listen_on(16881, 16881, 'localhost')

        memusage('session started')

        #for i in range(5000):
        #    testfile = 'file%d' % i
        #    os.system('dd if=/dev/urandom of=%s/%s bs=16k count=1 >/dev/null 2>&1' % (dir, testfile))

        for i in range(5):
            torrents = add_torrents(dir, session, 5000)
            assert len(session.get_torrents()) == 5000
            memusage('added %d torrents' % len(torrents))

            read_alerts(session)

            wait_torrents(torrents)
            memusage('all seeding')

            time.sleep(2)

            read_alerts(session)

            remove_torrents(torrents, session)
            memusage('all removed')
            assert len(session.get_torrents()) == 0

            read_alerts(session)

        while 1:
            memusage('waiting...')
            time.sleep(3)

    finally:
        print 'remove tree'
        #shutil.rmtree(dir)

if __name__ == '__main__':
    main()

Tune, "dir" in main() and make there 5000 files (file0 -> file5000) there 
before run. 
Here is how looking last few lines of output:

all seeding: 154344/113928 mem, 86.4% CPU
read 0 alerts
all removed: 178024/137984 mem, 83.1% CPU
read 0 alerts
yes 5000
added 5000 torrents: 181360/140652 mem, 84.9% CPU
read 0 alerts
wait 3050
wait 1687
wait 144
all seeding: 181096/140708 mem, 87.4% CPU
read 0 alerts
all removed: 209640/168532 mem, 82.4% CPU
read 0 alerts
waiting...: 209640/168532 mem, 82.5% CPU
waiting...: 209640/168532 mem, 76.6% CPU
waiting...: 209640/168532 mem, 71.4% CPU
waiting...: 209640/168532 mem, 66.8% CPU
(wait a lot)
waiting...: 209640/168532 mem, 02.9% CPU
<CTRL-C>, <CTRL-Z>, kill -kill % :)

Original comment by mocksoul on 25 Nov 2010 at 5:17

GoogleCodeExporter commented 8 years ago
Tried to do this trick on my production code:
Instead of removing torrents explicity, just mark them as removed and call 
pause(). Then, on torrent tracker reply -- make real removal from session. I 
need to wait tracker reply, not just alert_paused alert.

This way - memory usage grew to ~120mb and seems to be quite stable on that 
value. Oh, I see memory going down sometimes... at last!! :)

After that I was feeling brave and tried to share 20000 torrents at once, and 
remove after that. Memory usage grew from 120mb to 450mb and then went down to 
370mb. Tried to add another 2000 torrents and memory went down again! to 317mb. 
Another 2000 added torrents gave me 303mb :)). Another few runs of share/delete 
2000 torrents gave me amazing 212mb :))).

Since I did totaly shared about 100000 torrents and memory footprint was about 
230mb at the end -- I can say that in this case there is no memory leak at all. 
All I need - pause, wait for tracker response, remove. Obviously, this will not 
allow remove torrents if tracker is down, so this is just experimental, but not 
usable for me :(

Also I have two scenarios, which will be quite difficult to make stable
1) remove torrent and immidiately add another one with same save_path as 
previous one
2) remove torrent and immidiately add same torrent again (this could happen if 
some program generates some files, shares them, remove files, generate again, 
and shares again)

Original comment by mocksoul on 25 Nov 2010 at 7:16

GoogleCodeExporter commented 8 years ago
oh my, didnt turned off server from prev run and added "share/remove 2000 
torrents" job in loop. Few minutes later -- and memory grew down to 170mb. This 
probably the way tcmalloc works freeing memory to OS.

Original comment by mocksoul on 25 Nov 2010 at 7:19

GoogleCodeExporter commented 8 years ago
I'm still trying to reproduce this. that python script either gets stuck (hung 
forever in the system call to close() one of the asio pipes) or fails because 
of too many files open. I'm trying to reproduce it on a mac, and I've been 
trying to avoid having to run it as root (darwin has an upper limit of just a 
few thousand file descriptors per process unless you're root). In the meantime, 
could you try this patch? It's sort of a wild guess.

Index: src/torrent.cpp
===================================================================
--- src/torrent.cpp (revision 5046)
+++ src/torrent.cpp (working copy)
@@ -1364,7 +1364,7 @@
            else
 #endif
                m_ses.m_tracker_manager.queue_request(m_ses.m_io_service, m_ses.m_half_open, req
-                   , tracker_login() , 
m_abort?boost::shared_ptr<torrent>():shared_from_this());
+                   , tracker_login() , shared_from_this());
            ae.updating = true;
            ae.next_announce = now + seconds(20);
            ae.min_announce = now + seconds(10);

Original comment by arvid.no...@gmail.com on 27 Nov 2010 at 5:52

GoogleCodeExporter commented 8 years ago
Hi. Yeah, deal with max files open is quite tricky.
On linux platform I just run that from root, on freebsd I have huge ulimit for 
open file descriptors already, so it runs without a problem. On linux that 
could be set too, but need login/logout.

Will check your patch in next few minutes.

Original comment by mocksoul on 27 Nov 2010 at 6:52

GoogleCodeExporter commented 8 years ago
Okay, amazingly, this seems to fix problem.

With one exception :) I did tried to apply fix and run my test script:

mocksoul@mocksoul ~root/libtorrent-leaks % $PYTHON test.py
start: 13836/7512 mem, 0.0% CPU
Using temp directory: /tmp/tleaktmp
session started: 39452/8084 mem, 0.0% CPU
yes 5000
added 5000 torrents: 70924/31676 mem, 149% CPU
read 0 alerts
all seeding: 70660/31940 mem, 127% CPU
read 0 alerts
all removed: 91560/52196 mem, 82.8% CPU
read 0 alerts
yes 5000
added 5000 torrents: 75536/36388 mem, 100% CPU
read 0 alerts
all seeding: 75532/36412 mem, 103% CPU
read 0 alerts
zsh: segmentation fault  $PYTHON test.py

So, it was performing very well, as you can see -- eat 50mb right after 
removing and released all memory very soon to 30mb. But, as you also can see -- 
segfault at end.

Next, I tried to run the same under gdb to find what is going on. And... no 
segfault this time. Next, I tried to run again without gdb and result was 
absolutely the same as gdb run, even 48 eaten memory at the end was the same. 
So, I'll not duplicate myself and here is one last run log:

mocksoul@mocksoul ~root/libtorrent-leaks % $PYTHON test.py
start: 13836/7500 mem, 0.0% CPU
Using temp directory: /tmp/tleaktmp
session started: 39452/7924 mem, 0.0% CPU
yes 5000
added 5000 torrents: 70964/31656 mem, 101% CPU
read 0 alerts
all seeding: 70700/31624 mem, 126% CPU
read 0 alerts
all removed: 91868/52224 mem, 83.1% CPU
read 0 alerts
yes 5000
added 5000 torrents: 85776/45144 mem, 99.4% CPU
read 0 alerts
all seeding: 85512/45148 mem, 99.6% CPU
read 0 alerts
all removed: 94492/55024 mem, 83.9% CPU
read 0 alerts
yes 5000
added 5000 torrents: 81556/41688 mem, 93.1% CPU
read 0 alerts
all seeding: 81556/41716 mem, 94.1% CPU
read 0 alerts
all removed: 92520/53772 mem, 84.4% CPU
read 0 alerts
yes 5000
added 5000 torrents: 82768/42988 mem, 87.7% CPU
read 0 alerts
all seeding: 82768/42988 mem, 88.6% CPU
read 0 alerts
all removed: 98432/58064 mem, 82.2% CPU
read 0 alerts
yes 5000
added 5000 torrents: 86956/46752 mem, 85.4% CPU
read 0 alerts
all seeding: 86956/46752 mem, 85.5% CPU
read 0 alerts
all removed: 97360/57276 mem, 80.7% CPU
read 0 alerts
yes 5000
added 5000 torrents: 87052/46888 mem, 84.6% CPU
read 0 alerts
all seeding: 87052/46888 mem, 85.7% CPU
read 0 alerts
all removed: 97556/57236 mem, 81.6% CPU
read 0 alerts
yes 5000
added 5000 torrents: 87884/47596 mem, 85.0% CPU
read 0 alerts
all seeding: 87880/47596 mem, 85.8% CPU
read 0 alerts
all removed: 96268/56876 mem, 84.0% CPU
read 0 alerts
yes 5000
added 5000 torrents: 91996/51780 mem, 85.1% CPU
read 0 alerts
all seeding: 91996/51780 mem, 85.9% CPU
read 0 alerts
all removed: 99952/59736 mem, 82.9% CPU
read 0 alerts
yes 5000
added 5000 torrents: 91992/51768 mem, 83.9% CPU
read 0 alerts
all seeding: 91992/51768 mem, 85.0% CPU
read 0 alerts
all removed: 97244/57404 mem, 82.3% CPU
read 0 alerts
yes 5000
added 5000 torrents: 87896/48132 mem, 84.6% CPU
read 0 alerts
all seeding: 87896/48132 mem, 85.1% CPU
read 0 alerts
all removed: 98432/59180 mem, 82.7% CPU
read 0 alerts
waiting...: 98432/59176 mem, 82.7% CPU
waiting...: 87896/48228 mem, 79.7% CPU
waiting...: 87896/48228 mem, 76.7% CPU
waiting...: 87896/48228 mem, 73.9% CPU
^Cremove tree
Traceback (most recent call last):
  File "test.py", line 150, in <module>
    main()
  File "test.py", line 143, in main
    time.sleep(3)
KeyboardInterrupt

So, problem seems to be dissappeared. I guess segfault was not related to this 
issue at all, because sometimes I have segfaults during torrent adding process 
and this feels not related to memory eat problem. It is very very very 
non-common segfault (once a day on 1 of 13000 servers), so didnt tried to catch 
it yet.

Now, I'll run my script for 1 hour or so with gdb, we will see what will happen 
on long run. :)

As of now I need to notice another one thing to you -- with commented 
stop_announcing() call in torrent::abort (that was in my code before I applied 
your patch) -- patch does not makes sense at all. But if I enable 
stop_announcing() again -- I have output which I provide above. This makes me 
worry a little -- what will happen if some torrents will be already paused 
(thus, stop_announcing() will not be called in any way).

All tests are done under linux 32bit, will rerun all on all platforms once we 
will stabilize all on linux32. 

Original comment by mocksoul on 27 Nov 2010 at 7:20

GoogleCodeExporter commented 8 years ago
Ehh.. man, you seems to be perfect in "wild guessing" :)))
Because there is currenly 24 runs with 15000 torrents (e.g. 360000 torrents 
shared) and just 113mb resident memory used. I did wait little more -- after 30 
runs it used ~140mb of memory. But this probably the way allocator works, dont 
know, need to run script for several hours to make final conclusion.

Original comment by mocksoul on 27 Nov 2010 at 8:06

GoogleCodeExporter commented 8 years ago
> As of now I need to notice another one thing to you -- with commented 
stop_announcing() call in
> torrent::abort (that was in my code before I applied your patch) -- patch 
does not makes sense at
> all. But if I enable stop_announcing() again -- I have output which I provide 
above. This makes me
> worry a little -- what will happen if some torrents will be already paused 
(thus, stop_announcing()
> will not be called in any way).

I'm not sure I fully understand this. My hope is that my patch would fix the 
issue by itself (i.e. you reverting your own patches), which it sounds like it 
did, right?

I don't think you would have any problems for paused or stopped torrents, where 
stop_announcing() isn't called. I believe the problem is related to the tracker 
callback doing something useful to put the torrent object in a destructible 
state, which it would already be in if it's paused. I need to dig deeper to 
figure out what that might be, but for now this seems like a reasonable 
solution. The only reason this code was there in the first place was as an 
optimization (to allow for early destruction of the torrent object).

One guess would be that the torrent object might get destructed by another 
thread, other than the main network thread for instance, and that there is some 
race condition because of it.

I'll check this fix in for now, and I might look into this more later, in order 
to re-introduce the optimization.

Thank you for all your testing and debugging! Please let me know if you run 
into any other problems.

Original comment by arvid.no...@gmail.com on 27 Nov 2010 at 10:01

GoogleCodeExporter commented 8 years ago
Tomorrow is the Big Testing Day :).

Original comment by mocksoul on 27 Nov 2010 at 10:55

GoogleCodeExporter commented 8 years ago
Fast stats during all night run, share/remove N torrents 1024 times:

N=100, 12mb memory at the end
N=1000, 31mb memory at the end
N=5000, 74mb memory at the end
N=20000, 253mb memory, still runnning 256th run

So, manymanymany thanks to you, no memory leaks at all here.
Now, I'll check my full daemon on freebsd & linux platforms.

Original comment by mocksoul on 28 Nov 2010 at 7:45

GoogleCodeExporter commented 8 years ago
One more nice news: this patch seems to fix both problems -- slow, but constant 
memory usage grow and huge non-stable memory grow sometimes. So, I can again 
use pymalloc in python and not use custom allocator like tcmalloc for boost & 
libtorrent. (I'm talking about 2 separate cases from 
http://code.google.com/p/libtorrent/issues/detail?id=124#c49)

Full daemon running shows usage ~700mb memory after 100x5000 runs. But, this 
probably caused from my code, I'll dig deeper for that.

Original comment by mocksoul on 28 Nov 2010 at 9:58

GoogleCodeExporter commented 8 years ago
I believe this ticket can be considered closed then. Feel free to re-open if 
you discover something new related to it.

Original comment by arvid.no...@gmail.com on 6 Dec 2010 at 6:54

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
Some status about this ticket -- is *seems* no memory leaks at all after this 
and issue #130, but I need more testing

Original comment by mocksoul on 14 Dec 2010 at 6:48