jvirkki / dupd

CLI utility to find duplicate files
http://www.virkki.com/dupd
GNU General Public License v3.0
112 stars 16 forks source link

unable to allocate read buffer #33

Open tmsd2001 opened 3 years ago

tmsd2001 commented 3 years ago

V1.7 At one drive I get the "unable to allocate read buffer" error.

jvirkki commented 3 years ago

It appears to be running out of memory.

Try running 'top' or similar tool to monitor the memory used by the dupd process, how large does it grow and how much RAM does the system have?

Another detail to observe would be buffer size percentage. During the scan, observe the %b value, how large does it grow? For example in the sample below it is showing "2%b":

Sets : 68/ 40526 217891K ( 13618K/s) 0q 2%b 1206f 16 s

Also please post the output of this: dupd scan -p $HOME -v -v -v | head -10 to see if the buffer allocation is wrong.

Finally, if these confirm it is running out of memory, you can limit the buffer size with --buflimit. For example this limits the buffers to 2 GB. See man page. dupd scan -p $HOME --buflimit 2G

tmsd2001 commented 3 years ago

Mem 4G + 120G Swap Mem 62% CPU 23%

12563/ 13946 10576012K ( 12545K/s) 0q 94%B 4935f 843 s 13294/ 13946 17774147K ( 16673K/s) 0q 92%b 2140f 1066 s error: unable to allocate read buffer, sorry!

Defaulting --path to [/srv/dev-disk-by-label-backup2] Will be using_fiemap (if available): 1 Reported RAM: 3877MB buffer limit: 1938MB Log level: INFO Claimed CPU cores: 4 Max open files: 1048566 Done initializing new database [/root/.dupd_sqlite] Set path_separator from db to () Set scan_hidden from db to 1 database create time 116751375

buflimit 3G 8063/ 13946 6233104K ( 30554K/s) 0q 85%b 21978f 204 s error: unable to allocate read buffer, sorry! buflimit 2G error: unable to allocate read buffer, sorry! buflimit 1G Max 100%B Total duplicates: 57278 files in 21061 groups in 1075 s

jvirkki commented 3 years ago

The default buffer limit in your setup above is 1938MB, half of total RAM. Thus setting --buflimit to anything over that will see the same problem. The limit of 1G worked, so you could (if you wanted to play with it) find a sweet spot between 1G and 1938M that works. Probably won't make a big difference in speed though.

When %B is capitalized (instead of %b) it means it ran out of buffer space and switched to a linear scan of the files which requires less buffers but can (potentially) be much slower. If the buffer percentage goes down, it can automatically switch back to the regular scan mode (back to %b).

In any case, there is likely some bug here, as dupd should be resilient to running out of space by switching to the slower mode, it shouldn't just give up.