jasonmc / forked-daapd

A re-write of the firefly media server (mt-daapd). It's released under GPLv2+. Please note that this git repository is a mirror of the official one at git://git.debian.org/~jblache/forked-daapd.git
http://blog.technologeek.org/2009/06/12/217
GNU General Public License v2.0
328 stars 45 forks source link

forked-daapd bloats in memory and then gets killed #40

Open arthurlutz opened 13 years ago

arthurlutz commented 13 years ago

forked-daapd seems to take up a lot of memory on certain operations. On small device with low memory available this is pretty fatal.

On a device with 256M of RAM and 750 M of swap, after a bit I get to 80% of memory usage (190M res and 600M virtual). This fluctuates over time but sometimes causes a out-of-memory crash. I often get back to 5% memory usage. Is this a case of trying to load to memory a large file (wav or misplaced file) ?

The music library is pretty large, I can give more details if needed.

top "screenshot" :

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16549 daapd 20 0 751m 186m 256 S 22.7 74.7 25:06.73 forked-daapd

x42 commented 12 years ago

Even more critical: when scanning larger collections, forked-daap easily exceeds >2GB of virtual-memory (resident memory is still only a few hundred MB).

On an i386 system without PAE/bigmem support the process is terminated with an out-of-memory error or segmentation fault when it hits either 2 or 3 GB depending on the OS.

workaround: [re]-scan the collection with while true; do pidof forked-daapd || /etc/init.d/forked-daapd start; sleep 10;done # :) after the scan completed, forked-daapd stays below 2GB VM usage.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3446 daapd 20 0 2845m 653m 5108 S 19 32.4 7:20.18 forked-daapd

x42 commented 12 years ago

I've started tracking this down. The file-scanner spawns ~100 threads using libdispatch. All those allocate ffmpeg data structures, etc. Any hints on how to limit the concurrency of that?

Here's a printout of valgrind's massif tool. Alas, it does not indicate where exactly in forked-daapd the memory is allocated because it's all abstracted by dispatch thread_create.

--------------------------------------------------------------------------------
Command:            ./forked-daapd -f -d 3
Massif arguments:   --pages-as-heap=yes --depth=20
ms_print arguments: massif.out.26866
--------------------------------------------------------------------------------

    GB
1.988^                                                                       :
     |                                                                     @:#
     |                                                                    :@:#
     |                                                                    :@:#
     |                                                           ::::@@@@::@:#
     |                                                          :::: @   ::@:#
     |                                                          :::: @   ::@:#
     |                                                        :::::: @   ::@:#
     |                                                        :::::: @   ::@:#
     |                                                      @::::::: @   ::@:#
     |                                                    ::@::::::: @   ::@:#
     |                                                   :::@::::::: @   ::@:#
     |                                                  :@::@::::::: @   ::@:#
     |                                                 ::@::@::::::: @   ::@:#
     |                                               ::::@::@::::::: @   ::@:#
     |                                               @:::@::@::::::: @   ::@:#
     |                                              :@:::@::@::::::: @   ::@:#
     |                                             ::@:::@::@::::::: @   ::@:#
     |                                           ::::@:::@::@::::::: @   ::@:#
     |         ::::::::::::::::::::::::::::::::@:::::@:::@::@::::::: @   ::@:#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   9.534

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  0              0          110,592          110,592             0            0
  1     61,610,346      102,752,256      102,752,256             0            0
  2    775,586,477      102,678,528      102,678,528             0            0

[...]

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
  6  5,988,325,410      163,643,392      163,643,392             0            0
  7  6,060,707,756      172,036,096      172,036,096             0            0
  8  6,138,592,165      197,509,120      197,509,120             0            0
  9  6,202,353,609      214,405,120      214,405,120             0            0
 10  6,249,723,091      281,788,416      281,788,416             0            0
 11  6,333,369,548      307,781,632      307,781,632             0            0
 12  6,401,781,111      374,890,496      374,890,496             0            0
 13  6,466,230,607      409,358,336      409,358,336             0            0
 14  6,558,027,788      418,054,144      418,054,144             0            0
 15  6,633,769,158      485,359,616      485,359,616             0            0
 16  6,699,389,590      620,236,800      620,236,800             0            0
 17  6,739,291,696      636,841,984      636,841,984             0            0
100.00% (636,841,984B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->92.42% (588,570,624B) 0x52A2EE7: mmap (mmap.S:62)
| ->92.25% (587,489,280B) 0x4EEE539: pthread_create@@GLIBC_2.1 (allocatestack.c:498)
| | ->83.03% (528,740,352B) 0x5D3DFEC: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | ->80.39% (511,954,944B) 0x5D3E67D: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | | ->80.39% (511,954,944B) 0x4EEDC37: start_thread (pthread_create.c:304)
| | | |   ->80.39% (511,954,944B) 0x52A696C: clone (clone.S:130)
| | | |
| | | ->02.64% (16,785,408B) 0x5D3E183: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | |   ->02.64% (16,785,408B) 0x4EEDC37: start_thread (pthread_create.c:304)
| | |     ->02.64% (16,785,408B) 0x52A696C: clone (clone.S:130)
| | |
| | ->07.91% (50,356,224B) 0x5D3ED96: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | ->07.91% (50,356,224B) 0x5D3D59E: pthread_workqueue_additem_np (in /usr/lib/libpthread_workqueue.so.0.0)
| | |   ->03.95% (25,178,112B) 0x51C30CE: ??? (in /usr/lib/libdispatch.so.0.0.0)
| | |   |
| | |   ->03.95% (25,178,112B) 0x51C38A2: ??? (in /usr/lib/libdispatch.so.0.0.0)
| | |
| | ->01.32% (8,392,704B) 0x5D3EAF4: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| |   ->01.32% (8,392,704B) 0x5D3D49E: pthread_workqueue_create_np (in /usr/lib/libpthread_workqueue.so.0.0)
| |     ->01.32% (8,392,704B) 0x51BCECF: dispatch_once_f (in /usr/lib/libdispatch.so.0.0.0)
| |       ->01.32% (8,392,704B) 0x51BD0FA: ??? (in /usr/lib/libdispatch.so.0.0.0)
| |         ->01.32% (8,392,704B) 0x51BD4E9: _dispatch_wakeup (in /usr/lib/libdispatch.so.0.0.0)
| |           ->01.32% (8,392,704B) 0x51BD49C: _dispatch_queue_push_list_slow (in /usr/lib/libdispatch.so.0.0.0)
| |             ->01.32% (8,392,704B) 0x51BD541: _dispatch_wakeup (in /usr/lib/libdispatch.so.0.0.0)
| |               ->01.32% (8,392,704B) 0x51BF14B: ??? (in /usr/lib/libdispatch.so.0.0.0)
| |                 ->01.32% (8,392,704B) 0x51BDD66: dispatch_main (in /usr/lib/libdispatch.so.0.0.0)
| |                   ->01.32% (8,392,704B) 0x804FB0D: main (in /tmp/forked-daapd-0.19gcd/src/forked-daapd)
| |
| ->00.17% (1,081,344B) in 1+ places, all below ms_print's threshold (01.00%)

[...]

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 18  6,805,495,566      695,586,816      695,586,816             0            0
 19  6,898,314,707      695,734,272      695,734,272             0            0
 20  6,972,688,984      781,934,592      781,934,592             0            0
 21  7,044,635,272      850,124,800      850,124,800             0            0
 22  7,131,244,599      858,517,504      858,517,504             0            0
 23  7,208,958,248      942,444,544      942,444,544             0            0
 24  7,286,480,871      950,837,248      950,837,248             0            0
100.00% (950,837,248B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->94.78% (901,197,824B) 0x52A2EE7: mmap (mmap.S:62)
| ->94.45% (898,019,328B) 0x4EEE539: pthread_create@@GLIBC_2.1 (allocatestack.c:498)
| | ->88.27% (839,270,400B) 0x5D3DFEC: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | ->86.50% (822,484,992B) 0x5D3E67D: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | | ->86.50% (822,484,992B) 0x4EEDC37: start_thread (pthread_create.c:304)
| | | |   ->86.50% (822,484,992B) 0x52A696C: clone (clone.S:130)
| | | |

[...]

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 54  9,974,388,344    2,066,014,208    2,066,014,208             0            0
 55 10,075,926,627    2,082,799,616    2,082,799,616             0            0
 56 10,110,806,069    2,133,155,840    2,133,155,840             0            0
 57 10,184,820,549    2,134,130,688    2,134,130,688             0            0
 58 10,185,956,887    2,133,159,936    2,133,159,936             0            0
100.00% (2,133,159,936B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->97.67% (2,083,520,512B) 0x52A2EE7: mmap (mmap.S:62)
| ->97.18% (2,072,997,888B) 0x4EEE539: pthread_create@@GLIBC_2.1 (allocatestack.c:498)
| | ->94.43% (2,014,248,960B) 0x5D3DFEC: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | ->93.64% (1,997,463,552B) 0x5D3E67D: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | | ->93.64% (1,997,463,552B) 0x4EEDC37: start_thread (pthread_create.c:304)
| | | |   ->93.64% (1,997,463,552B) 0x52A696C: clone (clone.S:130)
| | | |
| | | ->00.79% (16,785,408B) in 1+ places, all below ms_print's threshold (01.00%)
| | |
| | ->02.36% (50,356,224B) 0x5D3ED96: ??? (in /usr/lib/libpthread_workqueue.so.0.0)
| | | ->02.36% (50,356,224B) 0x5D3D59E: pthread_workqueue_additem_np (in /usr/lib/libpthread_workqueue.so.0.0)
| | |   ->01.18% (25,178,112B) 0x51C30CE: ??? (in /usr/lib/libdispatch.so.0.0.0)
| | |   |
| | |   ->01.18% (25,178,112B) 0x51C38A2: ??? (in /usr/lib/libdispatch.so.0.0.0)
| | |
| | ->00.39% (8,392,704B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->00.49% (10,522,624B) in 1+ places, all below ms_print's threshold (01.00%)
|
->01.84% (39,321,600B) 0x4016192: mmap (mmap.S:62)
| ->01.50% (31,948,800B) 0x4005E19: _dl_map_object_from_fd (dl-load.c:1189)
| | ->01.50% (31,948,800B) 0x4007232: _dl_map_object (dl-load.c:2250)
| |   ->01.49% (31,854,592B) 0x400D20A: openaux (dl-deps.c:65)
| |   | ->01.49% (31,854,592B) 0x400DB64: _dl_catch_error (dl-error.c:178)
| |   |   ->01.49% (31,854,592B) 0x400C480: _dl_map_object_deps (dl-deps.c:247)
| |   |     ->01.49% (31,711,232B) 0x4002BF3: dl_main (rtld.c:1815)
| |   |     | ->01.49% (31,711,232B) 0x401440F: _dl_sysdep_start (dl-sysdep.c:244)
| |   |     |   ->01.49% (31,711,232B) 0x4000C5B: _dl_start (rtld.c:342)
| |   |     |     ->01.49% (31,711,232B) 0x4000845: ??? (in /lib/i386-linux-gnu/ld-2.13.so)
| |   |     |
| |   |     ->00.01% (143,360B) in 1+ places, all below ms_print's threshold (01.00%)
| |   |
| |   ->00.00% (94,208B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->00.35% (7,372,800B) in 1+ places, all below ms_print's threshold (01.00%)
|
->00.48% (10,317,824B) in 1+ places, all below ms_print's threshold (01.00%)

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 59 10,236,839,325    2,133,147,648    2,133,147,648             0            0
elwertk commented 12 years ago

The filescanner currently pushes parallelisation to its extremes by having each directory scan independently on the global concurrent queue. That explains the massive amount of spawned threads you are seeing. A solution to this might be to rewrite the scanner to establish and maintain a reasonable small amount of managed queues that limit this extensive concurrency but still bring noticeable speed ups for the scanning process.

IIRC this is on the todo list.

x42 commented 12 years ago

Hi elwertk,

Thanks. Your explanation make sense. The top-level dir here contains > 5000 Folders (artists), each of which has at least one more folder (album) inside...

I've just symlinked all files into a single folder. A directory with >100k entries performs horribly in general but it certainly is good enough for listening to music.. Doing so works around the issue with forked-daapd: it now spawns 1 scanner thread and stays below 20MB of RSS and below 100MB of VM).

Now one may come up with a balanced tree structure to symlink files into an acceptable amount of folders, but the issue should rather be fixed in the daapd-filescanner.

x42 commented 12 years ago

PS. Does this /pushes parallelisation to its extremes/ really make sense here? The bottle-neck is I/O and not CPU power.

I've just split the symlinks up into 4 folders. Now I get 4 scanner-threads. but the total CPU is down from 25% (quad-core, 1 scanner thread) to ~15% with with the disk (RAID5) seeking loudly..

elwertk commented 12 years ago

No not really. As stated above a reasonable amount of concurrency would be the way to go to scale this right.

ghost commented 12 years ago

I'm guessing this never got fixed since mine keeps crashing with out of memory errors.

renpoo commented 11 years ago

Hi all! I'm new to here github.

Today I'd tried to build a Music Server on Ubuntu Server, it was a tough task. Finally I got success to make this forked-daapd work but faced this trouble. I have massive music library already on iTunes over 8,000 musics. And the scanner got to fail with "cannot memory alocate".

But I found a tricky idea. First of all, clear the music directory once, and then restart the daapd service. After that, I re-copy all of library by rsync. Then, daapd starts scanning one by one (even though there are error log like "scan: Could not lstat() '/media/renpoo/Music HD/iTunes/iTunes Media/Music/Keith Jarrett/Solo Concerts_ Lausanne [Disc 2]/.2-01 Lausanne.wav.CZ1MSH':").

Now my machine is on the scan job. But it seems it is succeeding to make a newer library since I can connect it by rhythmbox and play the sound.

In the conclution, I think this problem must be fixed. Thnx!

elwertk commented 11 years ago

As intense resource consumption of the filescanner caused by massive parallelisation is causing issues for some here is an idea how one may go on limiting concurrency.

Attach code snippet which should cleanly apply to the latest head of the GCD version does so by introducing a semaphore that is base adjusted on the number of file descriptors available to forked daapd. (currently a quarter of that number) as set in here: semaphore = dispatch_semaphore_create(getdtablesize() / 4); dispatch_semaphore_create takes an integer so play arround with it as you like for your tests.

https://gist.github.com/elwertk/6163092

It does limit the excesss of the scanner on my system but haven't done any further testing, leveling or balancing of this. Not saying using file descriptors make a good baseline to effectively limit this but this may give an idea to get people started.

This is largely untested so use with caution