set entry state FS_UNIQUE but current state is FS_BUFFER_READY

tmsd2001 commented 4 years ago

Version 1.7 work fine.

Version 2.0-dev error: set entry state FS_UNIQUE but current state is FS_BUFFER_READY

----- dump path block list for size 180457 -----
bad state
  head: 0xb68f2622
  last_elem: 0xafc6a81e
  list_size: 3
  wanted_bufsize: 180457
  buffer_ready: 1
  state: PLS_NEED_DATA
  hash_passes: 0
  have_cached_hashes: 0
  sizelist back ptr: 0xb14007e0
   forward ptr back to me: 0xb68f2622
  first_elem: 0xb68f2636
 --entry 1
   file state: FS_NEED_DATA
   filename_size: 50
   dir: 0xb20ae008
   fd: 0
   next: 0xb68f26a8
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: 0xb1407f38
   hash_ctx: (nil)
      BLOCK LIST: count=1
      [0] start_pos: 0 ,  len: 180457 , block: 3329244
   filename (direct read): [20200703-133615-rpi3-1008-00-3741-13-0-0-0-0-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/20200703-133615-rpi3-1008-00-3741-13-0-0-0-0-n.jpg]
 --entry 2
   file state: FS_UNIQUE
   filename_size: 42
   dir: 0xb20ae008
   fd: 0
   next: 0xb695ecd5
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: (nil)
   hash_ctx: (nil)
   filename (direct read): [20200704-140000-rpi3-1050-00-0-12-snap.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/20200704-140000-rpi3-1050-00-0-12-snap.jpg]
 --entry 3
   file state: FS_UNIQUE
   filename_size: 56
   dir: 0xb20ae008
   fd: 0
   next: 0xb0b4a36e
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: (nil)
   hash_ctx: (nil)
   filename (direct read): [20200814-164257-rpi3-964-00-3789-13-44-120-286-672-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/20200814-164257-rpi3-964-00-3789-13-44-120-286-672-n.jpg]
 --entry 4
   file state: FS_NEED_DATA
   filename_size: 50
   dir: 0xb20ae25c
   fd: 0
   next: 0xb0f024da
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: 0xb08e9ed8
   hash_ctx: (nil)
      BLOCK LIST: count=1
      [0] start_pos: 0 ,  len: 180457 , block: 3487474
   filename (direct read): [20200426-111337-rpi3-3760-00-2290-12-0-0-0-0-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/hier/nicht/suchen/11/20200426-111337-rpi3-3760-00-2290-12-0-0-0-0-n.jpg]
 --entry 5
   file state: FS_BUFFER_READY
   filename_size: 57
   dir: 0xb20ae0da
   fd: 9439
   next: 0xafacdc9c
   buffer: 0x56a2b198
   bufsize: 180457
   data_in_buffer: 180457
   file_pos: 180457
   next_read_byte: 180457
   next_buffer_pos: 180457
   next_read_block: 1
   blocks: 0xb04aab20
   hash_ctx: (nil)
      BLOCK LIST: count=1
      [0] start_pos: 0 ,  len: 180457 , block: 2664921
   filename (direct read): [20200405-155721-rpi3-3270-00-8230-15-84-148-672-478-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/2020-04/20200405-155721-rpi3-3270-00-8230-15-84-148-672-478-n.jpg]
 --entry 6
   file state: FS_UNIQUE
   filename_size: 57
   dir: 0xb20ae08c
   fd: 0
   next: 0xafc6a81e
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: (nil)
   hash_ctx: (nil)
   filename (direct read): [20200517-133329-rpi3-4406-00-4202-16-104-58-290-499-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/gesichtet/20200517-133329-rpi3-4406-00-4202-16-104-58-290-499-n.jpg]
 --entry 7
   file state: FS_UNIQUE
   filename_size: 50
   dir: 0xb20ae07d
   fd: 0
   next: (nil)
   buffer: (nil)
   bufsize: 0
   data_in_buffer: 0
   file_pos: 0
   next_read_byte: 0
   next_buffer_pos: 0
   next_read_block: 0
   blocks: (nil)
   hash_ctx: (nil)
   filename (direct read): [20200417-144809-rpi3-3523-00-2274-12-0-0-0-0-n.jpg]
   built path: [/srv/dev-disk-by-label-archiv/archiv/motion-all/camerapi3/camerapi3_pics/000/20200417-144809-rpi3-3523-00-2274-12-0-0-0-0-n.jpg]
counted entries: 7
valid entries: 3
-----

jvirkki commented 4 years ago

This may be related to the memory shortage, I'll need to debug some.

Also please post the output of the scan with the additional option "--debug-size 180457", this will show state transition details only for the files of size 180457 which caused the crash above.

youduda commented 4 years ago

The same error happens for my files. With the command ./dupd scan --path /rpool/data/nas/ and an unchanged dataset the error is reproducible in any attempt. Log file of ./dupd scan --path /rpool/data/nas/ --debug-size 1495743 with Files scanned <num> and Processed <num> lines removed. dupd.log

jvirkki commented 4 years ago

That's a strange log as none of the three files appear to be identified yet as a unique. I've been changing the state processing in the development branch and clearly has some bugs still. Will continue working on it. Do you see any problems with the release branch (1.7)?

youduda commented 4 years ago

The 1.7 release does a great job.

pjft commented 3 years ago

Same here. If there's anything I can do to help troubleshoot, do let me know. I downgraded to 1.7 and running it now. It seems to work well so far.

Definitely related to memory limits - if I didn't set them for a specific set of files, the process would just take up all system memory and be killed. When I set a memory limit (1500M for instance) it will exit like that.

Thanks for putting this together, though - what an awesome tool.

jvirkki commented 3 years ago

How much RAM do you have? How large does the resident size get with default memory limits vs. when you set a --buflimit?

A problem is that --buflimit only applies to the read buffers. If RAM is tight and file count is high, it is possible to use up a significant percentage of memory before starting to fill the buffers, so total memory consumption becomes excessive. I plan to start accounting for all the data structures so that it is more aware of total memory usage.

pjft commented 3 years ago

Actually, I tried removing all large programs from memory, so I could run it without buflimit. I have 4GB of RM, on Linux. The app reaches around 2.5-3GB, buffer seems to reach 100%, and immediately crashes like this afterwards.

I'm happy to provide more logs, if I get some pointers. 1.7 didn't seem to work either, but I'm back to running it now with more memory, and hopefully it'll go well now.

This is the status so far, on 1.7, 2GB memory taken by the app so far, stable so far:

Files: 102667 0 errors 35 sms Sets : 1589/ 24779 4262213K ( 24355K/s) 0q 94%B 4222f 175 sms

Fingers crossed that now it works until the end :)

pjft commented 3 years ago

Actually no, it finished at:

Files: 102667 0 errors 35 sms Sets : 24375/ 24779 244811845K ( 45444K/s) 0q 74%b 846f 5387 serror: unable to allocate read buffer, sorry!

I'll try 1.7 with buflimit and see how it goes.

jvirkki commented 3 years ago

The uppercase B in "94%B" means at that point it had already hit the buflimit and is trying to free up memory. The %b will go up as buffers fill up. If it reaches 100%b it'll switch to %B and it should start going down (slowly).

From many data points, it does seem 4GB isn't quite enough to run it unless file/set count is low (it's more about how scattered the data is, not strictly file count, but works as an approximation).

If you feel like experimenting, try version 1.6 with the --ssd option. I took it out in 1.7 to avoid maintaining duplicate code paths, but the --ssd scan is always sequential so it needs a lot less buffer space. It may be that I need to reintroduce a form of linear scan for when memory is tight.

pjft commented 3 years ago

Certainly, I shall!

To be clear, this has worked perfectly fine in most of the cases I had used it for so far, and the latest version's performance is just awesome. I'm just raising this here in case I can help troubleshoot in any way - I am very thankful for what you put together here, and how impressive it is!

Thank you.

pjft commented 3 years ago

Just to update that 1.6 with --ssd option was what worked for me. Thank you!

Master and 1.7, even with more memory ended up failing for some reason - at least memory usage never got even close to the max. I am using a 32-bit OS, unsure if that'd be a factor.

Thanks once again and have a great weekend! If there's anything I can do to help let me know, I'm game.

jyukumite commented 3 years ago

Another data point:

Built from https://github.com/jvirkki/dupd/commit/62f2798743b0acc0424a1b8c33603f023df6fb4b 10GB ram free (of 32GB), 29GB of swap free Linux version 4.9.0-8-amd64 ... SMP Debian 4.9.130-2 (2018-10-27) Debian 9.13 (yes, I know, upgrade it already) 2.4TB folder, 219000 files 1 CPU, 6 cores ZFS

# ./dupd scan -p /dir1
Files:   207520                           0 errors                         21 s
Sets :    15166/   62814  204354689K (  59561K/s)    0q  88%B 97062f      3431 serror: set entry state FS_UNIQUE but current state is FS_BUFFER_READY
----- dump path block list for size 1287306 -----
bad state
  head: 0x7feff163a989
  last_elem: 0x7feff163aa0b
  list_size: 2
  wanted_bufsize: 524288
  buffer_ready: 1
  state: PLS_NEED_DATA
  hash_passes: 1
  have_cached_hashes: 0
  sizelist back ptr: 0x7fefe4a41250
   forward ptr back to me: 0x7feff163a989
  first_elem: 0x7feff163a9a9
 --entry 1
   file state: FS_NEED_DATA
   filename_size: 18
   dir: 0x7feff257d864
   fd: 52322
   next: 0x7feff163aa0b
   buffer: 0x7fef09e41e70
   bufsize: 65536
   data_in_buffer: 0
   file_pos: 65536
   next_read_byte: 65536
   next_buffer_pos: 0
   next_read_block: 0
   blocks: 0x7fefe4995d10
   hash_ctx: 0x7fefd822f900
      BLOCK LIST: count=1
      [0] start_pos: 0 ,  len: 1287306 , block: 134778
   filename (direct read): [file-with-18-character-filename.jpg]
   built path: [/dir1/1/file-with-75-character-filename-and-path.jpg]
 --entry 2
   file state: FS_BUFFER_READY
   filename_size: 18
   dir: 0x7feff2580eab
   fd: 8123
   next: (nil)
   buffer: 0x7fed26a00570
   bufsize: 524288
   data_in_buffer: 524288
   file_pos: 589824
   next_read_byte: 589824
   next_buffer_pos: 524288
   next_read_block: 0
   blocks: 0x7fefe4a41e70
   hash_ctx: 0x7fefd822f960
      BLOCK LIST: count=1
      [0] start_pos: 0 ,  len: 1287306 , block: 13617
   filename (direct read): [file-with-18-character-filename.jpg]
   built path: [/dir1/2/file-with-56-character-filename-and-path.jpg]
counted entries: 2
valid entries: 2
-----

jvirkki commented 3 years ago

For now please use the most recent released version (build from the 1.7 tag).

The dev release code is in progress and while I generally try to keep in usable shape, it can have bugs. Right now the 2.0-dev (master branch) is in the middle of some state management refactoring and has the bug described in this issue.

(Good news is I ran into this on one data set on one of my machines, so I can finally reproduce it locally. Been too busy for months to have time to do so, but will debug it when I have a moment. Meanwhile it is best to use the 1.7 release.)

jyukumite commented 3 years ago

Thanks jvirkki - 1.7 is working. It's at 20% after ~40 minutes, using >16GB of ram, so I'm not sure this program fits this use case (I'm sure I'll find other uses for it). It was using some swap (SSD), and lower CPU than jdupes (probably related).

# ./dupd scan -p /smallset
Files:   207812                           0 errors                         20 s
Sets :    22201/   62824  234327418K (  98955K/s)    0q  38%b 43310f      2368 s

It's a 2.4TB data set (is the reported size in KB or *10B?) something up with that.

jdupes, on the other hand, took 143 minutes but topped out at ~50MB ram

# time jdupes -r /smallset > dupes
Scanning: 216672 files, 2625 items (in 1 specified)
real    143m3.863s
user    12m54.680s
sys     20m2.492s

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
me       32110 23.3  0.1  49996 47016 pts/4    D+   07:45  30:15 jdupes -r /smallset

Thanks again,

jvirkki commented 3 years ago

Yes, dupd uses a lot of RAM for buffer caches (up to 50% of RAM by default for the cache, not including other internal data structures). So over 16GB is expected on your 32GB machine.

You can use --buflimit (e.g. --buflimit 8G) to constrain the buffer cache size, but unless you're memory constrained it's more often best to let it use the memory.

jvirkki commented 3 years ago

For anyone who has experienced the "set entry state FS_UNIQUE but current state is FS_BUFFER_READY" error, I just added a change which fixes it for (at least) the data set where I was able to reproduce the behavior. If you see this error on your data sets going forward please update this bug.

jvirkki commented 3 years ago

I believe this is fixed as noted in previous comment (in dev code for now, not in a released version yet). If anyone encounters it in the future please file new bug with details.

jvirkki / dupd

set entry state FS_UNIQUE but current state is FS_BUFFER_READY #32