zevv / duc

Dude, where are my bytes: Duc, a library and suite of tools for inspecting disk usage
GNU Lesser General Public License v3.0
596 stars 79 forks source link

duc index on large directory attempts to mmap() 16 Exabyte of memory #300

Open stuartthebruce opened 2 years ago

stuartthebruce commented 2 years ago

Attempting to index a large directory with 13,943,248 text files (and no sub-directories) generates an "out of memory" error after calling lstat() on all of the files, growing it's RES memory to ~2.2GByte of RES, and making a call to mmap() requesting 16 Exabyte of memory.

[root@zfs1 ~]# duc --version
duc version: 1.4.4
options: cairo x11 ui tokyocabinet

[root@zfs1 ~]# file $(which duc)
/usr/bin/duc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b502043469477f66e70a06449be4b4afe131ff3a, stripped

[root@zfs1 ~]# cat /etc/redhat-release 
Rocky Linux release 8.6 (Green Obsidian)

[root@zfs1 ~]# time duc index -xvp /home2/bhawana.sedhai/fall2022/multiclass/detection_2/Train_wave/bkg/O3b_LH_BKG_mulaseNoNorm_ANN_J_train/spectrograms
Writing to database "/root/.duc.db"
fatal error: out of memory in 13.9M files and 1 directories

real    0m36.429s
user    0m2.114s
sys 0m34.111s

Attaching strace to the duc process once it's RES size reaches 2GB,

[root@zfs1 ~]# strace -p $(pgrep duc) |& cat -n
...
2106838 lstat("877003420_type_0_factor_0.000000_rho_9.082606_job_1139_lag_113_start0_1259823547.468750_stop0_1259823547.562500_start1_1259823434.468750_stop1_1259823434.562500.png", {st_mode=S_IFREG|0644, st_size=11008, ...}) = 0
2106839 lstat("88464986_type_0_factor_0.000000_rho_8.293481_job_448_lag_84_start0_1258570761.375000_stop0_1258570761.421875_start1_1258570677.375000_stop1_1258570677.421875.png", {st_mode=S_IFREG|0644, st_size=11520, ...}) = 0
2106840 lstat("124456539_type_0_factor_0.000000_rho_8.726855_job_136_lag_117_start0_1258080950.890625_stop0_1258080950.937500_start1_1258080833.890625_stop1_1258080833.937500.png", {st_mode=S_IFREG|0644, st_size=11378, ...}) = 0
2106841 lstat("75752741_type_0_factor_0.000000_rho_12.795018_job_718_lag_78_start0_1259009502.000000_stop0_1259009502.250000_start1_1259009424.000000_stop1_1259009424.250000.png", {st_mode=S_IFREG|0644, st_size=12098, ...}) = 0
2106842 getdents64(4, 0x561219cbfca0 /* 0 entries */, 32768) = 0
2106843 chdir("..")                             = 0
2106844 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
2106845 write(1, "\33[K[#-------] Indexed 321.2Gb in"..., 63) = 63
2106846 mmap(NULL, 18446744071800393728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
2106847 mmap(NULL, 18446744071800524800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
2106848 write(2, "fatal error: out of memory\n", 27) = 27
2106849 exit_group(1)                           = ?
2106850 +++ exited with 1 +++

Note, duc is compiled as a 64-bit ELF binary on this large memory system that has 1TB of RAM, and the mmap() ENOMEM is happening while there is plenty of system memory available. However, the size_t length argument to mmap() is asking for 16 Exabyte.

Perhaps there is some legacy 32-bit integer in the duc code?

l8gravely commented 2 years ago

"stuartthebruce" == stuartthebruce @.***> writes:

Attempting to index a large directory with 13,943,248 text files (and no sub-directories) generates an "out of memory" error after calling lstat() on all of the files, growing it's RES memory to ~2.2GByte of RES, and making a call to mmap() requesting 16 Exabyte of memory.

Yeah, not unexpected. You have a stupid insane amount of files in that directory and I suspect alot of tools will have problems with it. Is there any way you can create sub-directories and move files down into them? It will help you alot.

If you could just break them up by job number even? I see in the strings for these files (which are stupid long filenames too!) that they include a job number.

As for duc, maybe you can try pulling down the source and compiling it yourself? It might also be that tokyocabinet doesn't handle stuff quite that big, but I'm not even sure how easy it would be for me to setup a test case for this.

See my comments at the bottom, but maybe check 'ulimit -a' as well, and unlimit everything if you can.

@.*** ~]# duc --version duc version: 1.4.4 options: cairo x11 ui tokyocabinet

@.*** ~]# file $(which duc) /usr/bin/duc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b502043469477f66e70a06449be4b4afe131ff3a, stripped

@.*** ~]# cat /etc/redhat-release Rocky Linux release 8.6 (Green Obsidian)

@.*** ~]# time duc index -xvp /home2/bhawana.sedhai/fall2022/multiclass/detection_2/Train_wave/bkg/O3b_LH_BKG_mulaseNoNorm_ANN_J_train/spectrograms Writing to database "/root/.duc.db" fatal error: out of memory in 13.9M files and 1 directories

real 0m36.429s user 0m2.114s sys 0m34.111s

Attaching strace to the duc process once it's RES size reaches 2GB,

@.** ~]# strace -p $(pgrep duc) |& cat -n ... 2106838 lstat("877003420_type_0_factor_0.000000_rho_9.082606_job_1139_lag_113_start0_1259823547.468750_stop0_1259823547.562500_start1_1259823434.468750_stop1_1259823434.562500.png", {st_mode=S_IFREG|0644, st_size=11008, ...}) = 0 2106839 lstat("88464986_type_0_factor_0.000000_rho_8.293481_job_448_lag_84_start0_1258570761.375000_stop0_1258570761.421875_start1_1258570677.375000_stop1_1258570677.421875.png", {st_mode=S_IFREG|0644, st_size=11520, ...}) = 0 2106840 lstat("124456539_type_0_factor_0.000000_rho_8.726855_job_136_lag_117_start0_1258080950.890625_stop0_1258080950.937500_start1_1258080833.890625_stop1_1258080833.937500.png", {st_mode=S_IFREG|0644, st_size=11378, ...}) = 0 2106841 lstat("75752741_type_0_factor_0.000000_rho_12.795018_job_718_lag_78_start0_1259009502.000000_stop0_1259009502.250000_start1_1259009424.000000_stop1_1259009424.250000.png", {st_mode=S_IFREG|0644, st_size=12098, ...}) = 0 2106842 getdents64(4, 0x561219cbfca0 / 0 entries */, 32768) = 0 2106843 chdir("..") = 0 2106844 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0 2106845 write(1, "\33[K[#-------] Indexed 321.2Gb in"..., 63) = 63 2106846 mmap(NULL, 18446744071800393728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) 2106847 mmap(NULL, 18446744071800524800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) 2106848 write(2, "fatal error: out of memory\n", 27) = 27 2106849 exit_group(1) = ? 2106850 +++ exited with 1 +++

Note, duc is compiled as a 64-bit ELF binary on this large memory system that has 1TB of RAM, and the mmap() ENOMEM is happening while there is plenty of system memory available. However, the size_t length argument to mmap() is asking for 16 Exabyte.

Perhaps there is some legacy 32-bit integer in the duc code?

Could be. If you could pull it down from github and compile it with debugging info, that would help. Or even run it with 'gdb' and get a backtrace when it fails so we can look at exactly where it's located in the code.

git clone https://github.com/zevv/duc

duc itself doesn't do mmap() calls directly, but I tokyocabinet does.

Also, do you have any limits defined? You might need to raise them in the process before you call duc. What do you get when you do:

ulimit -a

You might also try putting in a call to tcbdbsetxmsiz() in the file src/libduc/db-tokyo.c before the DB is opened.

We do set the flag BDBTLARGE, which should give you large memory support, but it's hard to know.

stuartthebruce commented 2 years ago

Yeah, not unexpected. You have a stupid insane amount of files in that directory and I suspect alot of tools will have problems with it. Is there any way you can create sub-directories and move files down into them? It will help you alot. If you could just break them up by job number even? I see in the strings for these files (which are stupid long filenames too!) that they include a job number.

Agreed on all points, and the end user responsible for this has been contacted. FWIW, I am impressed in how well ZFS is handling this extreme directory. However, given that the 10 Exabyte number probably reflects a software bug somewhere in the stack I thought I would pass it along.

As for duc, maybe you can try pulling down the source and compiling it yourself?

We already do that.

It might also be that tokyocabinet doesn't handle stuff quite that big, but I'm not even sure how easy it would be for me to setup a test case for this. See my comments at the bottom, but maybe check 'ulimit -a' as well, and unlimit everything if you can.

This is not a ulimit issue. The error occurs when the process memory size is only 2-2.5GByte on a 1TByte machine with ulimit settings of "unlimited". And I have seen a running duc process grow several GByte beyond that without crashing.

If you could pull it down from github and compile it with debugging info, that would help. Or even run it with 'gdb' and get a backtrace when it fails so we can look at exactly where it's located in the code. duc itself doesn't do mmap() calls directly, but I tokyocabinet does.

Indeed.

(gdb) break mmap
Breakpoint 1 at 0x7f377b827490
(gdb) continue
Continuing.

Breakpoint 1, 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6
(gdb) where
#0  0x00007f377b827490 in mmap64 () from /lib64/libc.so.6
#1  0x00007f377b797941 in sysmalloc () from /lib64/libc.so.6
#2  0x00007f377b798659 in _int_malloc () from /lib64/libc.so.6
#3  0x00007f377b7996ce in malloc () from /lib64/libc.so.6
#4  0x00007f377c634110 in tcbdbputimpl () from /lib64/libtokyocabinet.so.9
#5  0x00007f377c635137 in tcbdbput () from /lib64/libtokyocabinet.so.9
#6  0x00005564cc7d5560 in db_put ()
#7  0x00005564cc7d7681 in scanner_free ()
#8  0x00005564cc7d9566 in duc_index ()
#9  0x00005564cc7dfc7a in index_main ()
#10 0x00005564cc7d4698 in main ()
(gdb) 

Also, do you have any limits defined? You might need to raise them in the process before you call duc. What do you get when you do: ulimit -a

[root@zfs1 ~]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 4124858
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4124858
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

[root@zfs1 ~]# cat /proc/meminfo
MemTotal:       1056007592 kB
MemFree:        141926468 kB
MemAvailable:   152664688 kB
Buffers:           45000 kB
Cached:          2214596 kB
SwapCached:            0 kB
Active:          1085212 kB
Inactive:        3897024 kB
Active(anon):     862732 kB
Inactive(anon):  3765448 kB
Active(file):     222480 kB
Inactive(file):   131576 kB
Unevictable:       40124 kB
Mlocked:           37052 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                56 kB
Writeback:             0 kB
AnonPages:       2761684 kB
Mapped:           447052 kB
Shmem:           1894936 kB
KReclaimable:   15254084 kB
Slab:           801800740 kB
SReclaimable:   15254084 kB
SUnreclaim:     786546656 kB
KernelStack:       92608 kB
PageTables:        22672 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    528003796 kB
Committed_AS:    7252948 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:           912384 kB
HardwareCorrupted:     0 kB
AnonHugePages:    143360 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1682224 kB
DirectMap2M:    101607424 kB
DirectMap1G:    969932800 kB

You might also try putting in a call to tcbdbsetxmsiz() in the file src/libduc/db-tokyo.c before the DB is opened. We do set the flag BDBTLARGE, which should give you large memory support, but it's hard to know.

I added tcbdbsetxmsiz(db->hdb, 10485760000); and that did not help. I am going to stick with convincing the end user to change their behavior.

Thanks.

l8gravely commented 2 years ago

"stuartthebruce" == stuartthebruce @.***> writes:

Yeah, not unexpected. You have a stupid insane amount of files in that directory and I suspect
alot of tools will have problems with it. Is there any way you can create sub-directories and
move files down into them? It will help you alot. If you could just break them up by job
number even? I see in the strings for these files (which are stupid long filenames too!) that
they include a job number.

Agreed on all points, and the end user responsible for this has been contacted. FWIW, I am impressed in how well ZFS is handling this extreme directory. However, given that the 10 Exabyte number probably reflects a software bug somewhere in the stack I thought I would pass it along.

Yup, it's a bug somewhere, just not sure how to handle it. It might also be something at a different level. Maybe I can build a test directory on a Netapp and see what happens...

As for duc, maybe you can try pulling down the source and compiling it yourself?

We already do that.

Great. Makes it simpler.

It might also be that tokyocabinet doesn't handle stuff quite that big, but I'm not even sure
how easy it would be for me to setup a test case for this. See my comments at the bottom, but
maybe check 'ulimit -a' as well, and unlimit everything if you can.

This is not a ulimit issue. The error occurs when the process memory size is only 2-2.5GByte on a 1TByte machine with ulimit settings of "unlimited". And I have seen a running duc process grow several GByte beyond that without crashing.

Yeah, I'm wondering if it could be tuned in some other ways too.

If you look in src/libduc/db-tokyo.c:

int ret = tcbdbtune(db->hdb, 256, 512, 131072, 9, 11, opts);

It might be worth while trying to tweak some of those numbers as well. I suspect you're possibly hitting bucket limits somewhere in the Tokyocabinet code, but not really sure.

I'd probably double all those numbers and see if that makes a difference.

If you could pull it down from github and compile it with debugging info, that would help. Or
even run it with 'gdb' and get a backtrace when it fails so we can look at exactly where it's
located in the code. duc itself doesn't do mmap() calls directly, but I tokyocabinet does.

Indeed.

(gdb) break mmap Breakpoint 1 at 0x7f377b827490 (gdb) continue Continuing.

Breakpoint 1, 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6 (gdb) where

0 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6

1 0x00007f377b797941 in sysmalloc () from /lib64/libc.so.6

2 0x00007f377b798659 in _int_malloc () from /lib64/libc.so.6

3 0x00007f377b7996ce in malloc () from /lib64/libc.so.6

4 0x00007f377c634110 in tcbdbputimpl () from /lib64/libtokyocabinet.so.9

5 0x00007f377c635137 in tcbdbput () from /lib64/libtokyocabinet.so.9

6 0x00005564cc7d5560 in db_put ()

7 0x00005564cc7d7681 in scanner_free ()

8 0x00005564cc7d9566 in duc_index ()

9 0x00005564cc7dfc7a in index_main ()

10 0x00005564cc7d4698 in main ()

(gdb)

So that does imply to me that the problem really is in tokyocabinet, and where it's putting items into the B+ tree and dying.

Also, do you have any limits defined? You might need to raise them in the process before you
call duc. What do you get when you do: ulimit -a

@.*** ~]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 4124858 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4124858 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

That al looks great to me.

@.*** ~]# cat /proc/meminfo MemTotal: 1056007592 kB MemFree: 141926468 kB

Sweet machine. :-)

You might also try putting in a call to tcbdbsetxmsiz() in the file src/libduc/db-tokyo.c
before the DB is opened. We do set the flag BDBTLARGE, which should give you large memory
support, but it's hard to know.

I added tcbdbsetxmsiz(db->hdb, 10485760000); and that did not help. I am going to stick with convincing the end user to change their behavior.

Yeah, I was going to offer the following patch to try:

    int opts = BDBTLARGE;
    if(compress) opts |= BDBTDEFLATE;
    int ret = tcbdbtune(db->hdb, 256, 512, 131072, 9, 11, opts);
    if(ret == 0) {
        *e = tcdb_to_errno(db->hdb);
        goto err2;
    }

    /* Hack to see if this is the problem with Stuart */
    unsigned long long bignum = 0x100000000ULL;
    ret = tcbdbsetxmsiz(db->hdb, bignum);
    if (ret == 0) {
            *e = tcdb_to_errno(db->hdb);
            goto err2;
    }

But you beat me to it. Try poking at the tune numbers, maybe that will change where the error happens and give us more info.

Thanks for being so responsive! John

l8gravely commented 2 years ago

Stuart, How long does it take for duc to blow up on your directory with 13 million files?

John

stuartthebruce commented 2 years ago

Stuart, How long does it take for duc to blow up on your directory with 13 million files? John

36.5 seconds (see the output of time in the original problem description)--kudus to ZFS. Note, I will not have any more time this week to poke at this further.

l8gravely commented 2 years ago

"stuartthebruce" == stuartthebruce @.***> writes:

Stuart, How long does it take for duc to blow up on your directory with 13 million files? John

36.5 seconds (see the output of time in the original problem description)--kudus to ZFS. Note, I will not have any more time this week to poke at this further.

Sure, I'll wait for you to try some of the other suggested changes and we'll see how it goes. You are an extreme case which we should handle more gracefully, but not sure how yet. Maybe counting number of files found in a directory and gracefully closing and re-opening the DB with some tuning done might be the trick.

But not really sure.

Good luck with the rest of your week. John

stuartthebruce commented 2 years ago

This problematic directory has been deleted since it was on a production system. If this comes up again I can try some of the suggestions.

stuartthebruce commented 1 year ago

This has happened again with another user and a 17.6M file directory. Where you able to reproduce this problem on a test setup or should I try twaking tcbdbtune()?

Note, if this isn't easy to sort out it might be worth having duc just stop after 10M files (or some similar default threshold; possibly with a command line option to change since this presumably depends on the database engine) and throw a warning to stderr that duc index stopped scanning directory X after Y files.

l8gravely commented 1 year ago

"stuartthebruce" == stuartthebruce @.***> writes:

This has happened again with another user and a 17.6M file directory. Where you able to reproduce this problem on a test setup or should I try twaking tcbdbtune()?

You should try tweaking it since you have a good test directory. I haven't bothered to do that, but with a bit of nudging I could do something, esp now that I have more SSD space, so I'm not beating up a RAID1 SATA disk pair with all those IOPs....

Note, if this isn't easy to sort out it might be worth having duc just stop after 10M files (or some similar default threshold; possibly with a command line option to change since this presumably depends on the database engine) and throw a warning to stderr that duc index stopped scanning directory X after Y files.

It would be good to handle this more gracefully, but since we also want to make sure we don't lose any sub-directories if at all possible... I'm not sure what to do here.

Now I don't think many tools will handle 17+ million files in a single directory, nor will alot of filesystems be very happy either.

This problem really reminds me of the solutions the old NNTP servers used to do where they would make multiple levels of directories down to split up files so they didn't overload things.

Just doing a simple

 opendir()
 while(readdir()) { print directory }
 closedir()

type of loop will run for a long time with that many files in a directory.

In any case, once I get some time I'll try to A) build a test structure and B) see if I can wrangle some better solutions here.

No promises, and since I'm taking a 3 hour class twice a week now, my time is kinda limited.

Cheers, John

stuartthebruce commented 7 months ago

I had another user dump 39.8M files in a single directory, so I tried doubling the tcbdbtune() arguments and adding the above tcbdbsetxmsiz() call with bignum in src/libduc/db-tokyo.c,

       int opts = BDBTLARGE;
        if(compress) opts |= BDBTDEFLATE;
        int ret = tcbdbtune(db->hdb, 512, 1024, 262144, 18, 22, opts);
        if(ret == 0) {
            *e = tcdb_to_errno(db->hdb);
            goto err2;
        }

        /* Hack to see if this is the problem with Stuart */
        unsigned long long bignum = 0x100000000ULL;
        ret = tcbdbsetxmsiz(db->hdb, bignum);
        if (ret == 0) {
                *e = tcdb_to_errno(db->hdb);
                goto err2;
        }

and that still failed with,

[root@zfs1 duc]# /usr/bin/time ./duc index -xvp -d /dev/shm/tst.duc /home2/adrian.macquet/pystampas_workspace/O4/bkg_O4a_burstegard_polarized_final/stage2/ftmaps
Writing to database "/dev/shm/tst.duc"
fatal error: out of memoryn 39.8M files and 1 directories
Command exited with non-zero status 1
6.77user 199.44system 3:27.33elapsed 99%CPU (0avgtext+0avgdata 2347580maxresident)k
416inputs+0outputs (4major+584949minor)pagefaults 0swaps
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Can I suggest you take that user out back and LART them? grin

I had another user dump 39.8M files in a single directory, so I tried doubling the tcbdbtune() arguments and adding the above tcbdbsetxmsiz() call with bignum in src/libduc/db-tokyo.c,

   int opts = BDBTLARGE;
    if(compress) opts |= BDBTDEFLATE;
    int ret = tcbdbtune(db->hdb, 512, 1024, 262144, 18, 22, opts);
    if(ret == 0) {
        *e = tcdb_to_errno(db->hdb);
        goto err2;
    }

    /* Hack to see if this is the problem with Stuart */
    unsigned long long bignum = 0x100000000ULL;
    ret = tcbdbsetxmsiz(db->hdb, bignum);
    if (ret == 0) {
            *e = tcdb_to_errno(db->hdb);
            goto err2;
    }

and that still failed with,

@.*** duc]# /usr/bin/time ./duc index -xvp -d /dev/shm/tst.duc /home2/adrian.macquet/pystampas_workspace/O4/bkg_O4a_burstegard_polarized_final/stage2/ftmaps Writing to database "/dev/shm/tst.duc" fatal error: out of memoryn 39.8M files and 1 directories Command exited with non-zero status 1 6.77user 199.44system 3:27.33elapsed 99%CPU (0avgtext+0avgdata 2347580maxresident)k 416inputs+0outputs (4major+584949minor)pagefaults 0swaps

Ouch, this is not good. Did it write anything to the tst.duc file? Or is it corrupted completely? And to refresh my memory, you're running this on linux, right? Do you have any limits in place which might be blocking stuff? What does:

  $ ulimits -a

say on this system you're running the scan on. And what OS is it running? I might be able to spin up a test case and see what happens, but it's not going to happen soon unfortunately.

John

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>

stuartthebruce commented 6 months ago

Can I suggest you take that user out back and LART them? grin

I got them to delete this directory without needing any such tool, but I learned a new word 😄

Ouch, this is not good. Did it write anything to the tst.duc file?

Yes

Or is it corrupted completely?

[root@zfs1 ~]# duc info -d /dev/shm/tst.duc
Date       Time       Files    Dirs    Size Path
[root@zfs1 ~]# echo $?
0
[root@zfs1 ~]# ls -l /dev/shm/tst.duc 
-rw-r--r-- 1 root root 1188864 Apr 28 16:50 /dev/shm/tst.duc

And to refresh my memory, you're running this on linux, right?

Yes.

Do you have any limits in place which might be blocking stuff?

No.

What does: $ ulimits -a say on this system you're running the scan on.

[root@zfs1 ~]# ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 4124123
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4124123
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And what OS is it running?

Rocky Linux 8.9

I might be able to spin up a test case and see what happens, but it's not going to happen soon unfortunately. John

No rush from my perspective. If I see this again before your testing I will try to remember to test with a different database backend to confirm this is a bug in Tokyocabinet.

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Can I suggest you take that user out back and LART them? grin

I got them to delete this directory without needing any such tool, but I learned a new word 😄

Ouch, this is not good. Did it write anything to the tst.duc file?

Yes

Or is it corrupted completely?

@. ~]# duc info -d /dev/shm/tst.duc Date Time Files Dirs Size Path @. ~]# echo $? 0 @.*** ~]# ls -l /dev/shm/tst.duc -rw-r--r-- 1 root root 1188864 Apr 28 16:50 /dev/shm/tst.duc

Yeah, that's probably not finished writing properly. Ugh.

And to refresh my memory, you're running this on linux, right?

Yes.

Do you have any limits in place which might be blocking stuff?

No.

What does: $ ulimits -a say on this system you're running the scan on.

@.*** ~]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 4124123 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4124123 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Your stack size is limited... but that shouldnt' do anything bad.

And what OS is it running?

Rocky Linux 8.9

I might be able to spin up a test case and see what happens, but
it's not going to happen soon unfortunately. John

No rush from my perspective. If I see this again before your testing I will try to remember to test with a different database backend to confirm this is a bug in Tokyocabinet.

I've just found this new library called tkrzw which might be a good replacement for Tokyocabinet and other tools. I'm going to try and find some time to poke at using it in a new branch and see what I can find.

I'd also suggest you try using leveldb or kyotocabinet if you get a chance. I know, a pain.

John

stuartthebruce commented 6 months ago

I've just found this new library called tkrzw which might be a good replacement for Tokyocabinet and other tools. I'm going to try and find some time to poke at using it in a new branch and see what I can find.

That looks very interesting, and is already packaged/distributed in EPEL for my systems.

I'd also suggest you try using leveldb or kyotocabinet if you get a chance. I know, a pain. John

Will do.

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

I've just found this new library called tkrzw which might be a good replacement for
Tokyocabinet and other tools. I'm going to try and find some time to poke at using it in a new
branch and see what I can find.

That looks very interesting, and is already packaged/distributed in EPEL for my systems.

I'd also suggest you try using leveldb or kyotocabinet if you get a chance. I know, a pain.
John

Will do.

So I took the time to add in support to tkrzw to duc and pushed it as the branch 'tkrzw' on github, please feel free to pull it and try to compile with it. It will show up as version 1.5.0 of duc.

I plan on running tests myself with large filesystems this coming week, along with making a stupid number of files in one directory (NFS mounted though...) and running duc with tkrzw on it to see how it handles it.

John

stuartthebruce commented 6 months ago

So I took the time to add in support to tkrzw to duc and pushed it as the branch 'tkrzw' on github, please feel free to pull it and try to compile with it.

Many thanks. It builds cleanly on RL8 and RL9, however, it segfaults on both when trying to index the duc source tree,

[root@zfs1 duc]# cat /etc/redhat-release 
Rocky Linux release 8.9 (Green Obsidian)

[root@zfs1 duc]# uname -a
Linux zfs1 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

[root@zfs1 duc]# ./duc --version
duc version: 1.5.0
options: cairo x11 ui tkrzw

[root@zfs1 duc]# ls
aclocal.m4      config.h.in    depcomp   INSTALL      missing    todo
autom4te.cache  config.log     doc       install-sh   README.md  TODO.md
build           config.status  duc       LICENSE      src        valgrind-suppressions
ChangeLog       configure      examples  Makefile     stamp-h1
compile         configure.ac   gentoo    Makefile.am  testing
config.h        debian         img       Makefile.in  test.sh

[root@zfs1 duc]# gdb --args ./duc index -xvp .
GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./duc...done.
(gdb) run
Starting program: /root/duc/duc index -xvp .
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Writing to database "/root/.cache/duc/duc.db"

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b61849 in tkrzw_dbm_get () from /lib64/libtkrzw.so.1
Missing separate debuginfos, use: yum debuginfo-install bzip2-libs-1.0.6-26.el8.x86_64 cairo-1.15.12-6.el8.x86_64 expat-2.2.5-11.el8_9.1.x86_64 fontconfig-2.13.1-4.el8.x86_64 freetype-2.9.1-9.el8.x86_64 fribidi-1.0.4-9.el8.x86_64 glib2-2.56.4-161.el8.x86_64 glibc-2.28-236.el8_9.12.x86_64 gmp-6.1.2-10.el8.x86_64 gnutls-3.6.16-8.el8_9.3.x86_64 harfbuzz-1.7.5-3.el8.x86_64 libX11-1.6.8-6.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libXrender-0.9.10-7.el8.x86_64 libdatrie-0.2.9-7.el8.x86_64 libffi-3.1-24.el8.x86_64 libgcc-8.5.0-20.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 libstdc++-8.5.0-20.el8.x86_64 libtasn1-4.13-4.el8_7.x86_64 libthai-0.1.27-2.el8.x86_64 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-44.el8_9.1.x86_64 libxcb-1.13.1-1.el8.x86_64 ncurses-libs-6.1-10.20180224.el8.x86_64 nettle-3.4.1-7.el8.x86_64 p11-kit-0.23.22-1.el8.x86_64 pango-1.42.4-8.el8.x86_64 pcre-8.42-6.el8.x86_64 pixman-0.38.4-3.el8_9.x86_64 tkrzw-libs-1.0.27-1.el8.x86_64 zlib-1.2.11-25.el8.x86_64
(gdb) where
#0  0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1
#1  0x0000000000404619 in db_get
    (val_len=, key_len=14, key=0x411649, db=0x625c60)
    at src/libduc/db-tkrzw.c:102
#2  0x0000000000404619 in db_open
    (path_db=path_db@entry=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=6, e=e@entry=0x625de8) at src/libduc/db-tkrzw.c:64
#3  0x00000000004051b1 in duc_open
    (duc=duc@entry=0x625de0, path_db=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127
#4  0x000000000040e281 in index_main (duc=0x625de0, argc=1, argv=0x7fffffffe2c0)
    at src/duc/cmd-index.c:94
#5  0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179
(gdb) 
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Many thanks. It builds cleanly on RL8 and RL9, however, it segfaults on both when trying to index the duc source tree,

Do me a favor and look inside src/libduc/db-tkrzw.c and check that I did a malloc of the 'db' pointer properly? I've been fighting work issues all day (and away most of the weekend and monday) so I'm just getting back to this and my testing, which is happening on debian based systems.

I'll have to try and see if I can get an AlmaLinux 8.x and/or 9.x system setup to test against. Or can you just get me a back trace from gdb as well when it crashes?

What version of tkrzw do you have on your system? I've only testing with 1.0.25 (debian packaged version) which did pass some basic tests, but nothing major.

l8gravely commented 6 months ago

Stuart, When you run your tests, make sure you turn off compression. I have it enabled, but it's not testing things properly, which might be why you are crashing.

./duc index -d /path/to.db --uncompressed -xvr /path/to/index

I'll push an update later today to turn it off unless we find the proper compression libraries on the system. So far I've got around 2 million files in my one directory and it's happy. Working on adding more files into a single directory to stress things.

l8gravely commented 6 months ago

Just a quick update, I've created an NFS filesystem with 21.7 million files in two directories, I kept hitting Netapp maxdirsize limits and I can't tweak the system too much.

But now I've got a test filesystem to scan and see how it all works. So far quite well. But I need to run a bunch more tests to see how things end up working out for tkrzw as a new backend.

And of course purging non-useful backends like tokyocabinet and kyotocabinet due to their being completely unmaintained anymore.

But the next question is to try and make an XFS filesystem on block storage and see how many files I can create inside a single directory. But filling up the space with hundreds of gigabytes is going to be harder to allow.

But I do have some large 10tb filesystems with 20+ million files which I plan on scanning and seeing how well they work using various DB backends.

So the more info you can provide on your edge case, the better!

John

stuartthebruce commented 6 months ago

Stuart, When you run your tests, make sure you turn off compression. I have it enabled, but it's not testing things properly, which might be why you are crashing. ./duc index -d /path/to.db --uncompressed -xvr /path/to/index I'll push an update later today to turn it off unless we find the proper compression libraries on the system. So far I've got around 2 million files in my one directory and it's happy. Working on adding more files into a single directory to stress things.

[root@zfs1 duc]# ./duc index --uncompressed -xvr .
./duc: invalid option -- 'r'
Try 'duc --help' for more information.

in case you meant -p instead of -r that still generates a SEGV even with --uncompressed,

[root@zfs1 duc]# ./duc index --uncompressed -xvp .
Writing to database "/root/.cache/duc/duc.db"
Segmentation fault (core dumped)

However, if I first manually create /root/.cache/duc and use --uncompressed it runs,

[root@zfs1 duc]# ./duc index --uncompressed -xvp .
Writing to database "/root/.cache/duc/duc.db"
Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs.

In summary I think I have found 3 ways to SEGV:

  1. Destination directory for db file does not exist
  2. Run with default compression
  3. Attempt to update the partial database file from a previous SEGV

Here is an example of the third,

[root@zfs1 duc]# ./duc index -xvp .
Writing to database "/root/.cache/duc/duc.db"
Segmentation fault (core dumped)

[root@zfs1 duc]# ./duc index --uncompressed -xvp .
Writing to database "/root/.cache/duc/duc.db"
Segmentation fault (core dumped)

[root@zfs1 duc]# /bin/rm /root/.cache/duc/duc.db 

[root@zfs1 duc]# ./duc index --uncompressed -xvp .
Writing to database "/root/.cache/duc/duc.db"
Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs.
stuartthebruce commented 6 months ago

As a side question, is it expected that tkrzw will also solve #310, which I have recently run into again.

stuartthebruce commented 6 months ago

I ran a quick performance comparison between tokyocabinet and tkrzw on a modest filesystem with ~19M files in ~78k directories. The system was busy and I have not run multiple passes to measure the uncertainty, but tkrzw with no compression used less user cpu time, more wall clock time, and created a much larger output file (which post factor zstd compression was able to compensate for most but not all of the increase).

tkrzw

[root@zfs1 ~]# time duc/duc index -xvp /home2/cbc -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 2 minutes, and 19.04 seconds.

real    2m19.060s
user    0m4.216s
sys     2m13.849s

[root@zfs1 ~]# ls -lh /home2/duc.new
-rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new

[root@zfs1 ~]# du -h /home2/duc.new
309M    /home2/duc.new

tokyocabinet

[root@zfs1 ~]# time duc index -xvp /home2/cbc -d /home2/duc.old
Writing to database "/home2/duc.old"
Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 1 minutes, and 31.73 seconds.

real    1m32.964s
user    0m20.851s
sys     1m11.315s

[root@zfs1 ~]# ls -lh /home2/duc.new
-rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new

[root@zfs1 ~]# du -h /home2/duc.new
309M    /home2/duc.new

Where the /home2 filesystem is ZFS with compression=zstd that is responsible for the difference in ls and du.

And I have now started a larger scan of 1.2B files across 40M directories with tkrzw.

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

I ran a quick performance comparison between tokyocabinet and tkrzw on a modest filesystem with ~19M files in ~78k directories. The system was busy and I have not run multiple passes to measure the uncertainty, but tkrzw with no compression used less user cpu time, more wall clock time, and created a much larger output file (which post factor zstd compression was able to compensate for most but not all of the increase).

Yeah, right now tkrzw makes huge DB files, much larger than anything else, but I've also not implemented testing and support for compression yet. Should be easy enough to do, mostly checking for and adding support for lz4 libraries.

I guess my interest in this library is because it's maintained, and should ideally be able to handle the case you ran into with monster numbers of files in a single directory and/or in a single tree. But we will see if it's really worth it.

But I do want to start cutting back the number of backend DBs supported, just to make life simpler.

Thanks for all your help here! John

tkrzw

@.*** ~]# time duc/duc index -xvp /home2/cbc -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 2 minutes, and 19.04 seconds.

real 2m19.060s user 0m4.216s sys 2m13.849s

@.*** ~]# ls -lh /home2/duc.new -rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new

@.*** ~]# du -h /home2/duc.new 309M /home2/duc.new

tokyocabinet

@.*** ~]# time duc index -xvp /home2/cbc -d /home2/duc.old Writing to database "/home2/duc.old" Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 1 minutes, and 31.73 seconds.

real 1m32.964s user 0m20.851s sys 1m11.315s

@.*** ~]# ls -lh /home2/duc.new -rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new

@.*** ~]# du -h /home2/duc.new 309M /home2/duc.new

Where the /home2 filesystem is ZFS with compression=zstd that is responsible for the difference in ls and du.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>

stuartthebruce commented 6 months ago

My first large run ran for several hours before hanging about halfway through after scanning 665M files,

[root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
[-------#] Indexed 217.1Tb in 665.1M files and 19.6M directories

with no file db updates in the last 18+ hours,

[root@zfs1 ~]# stat /home2/duc.new && date
  File: /home2/duc.new
  Size: 34382856040 Blocks: 17557146   IO Block: 131072 regular file
Device: 2eh/46d Inode: 772         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-05-15 21:24:01.633988170 -0700
Modify: 2024-05-15 21:24:01.632988161 -0700
Change: 2024-05-15 21:24:01.632988161 -0700
 Birth: 2024-05-15 16:54:05.406875015 -0700
Thu May 16 15:58:54 PDT 2024

and the process continuing to use 100% of a cpu-core according to /bin/top,

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 
2890596 root      20   0  150080  37940   3908 R  99.4   0.0   1316:36 duc     

reading the 33GB db file a few hundred bytes at a time,

[root@zfs1 ~]# strace -p 2890596 |& head
strace: Process 2890596 attached
lseek(3, 6468256512, SEEK_SET)          = 6468256512
read(3, "0_609_O1_BKG_C02_R1_13052018_sla"..., 109) = 109
lseek(3, 6870895032, SEEK_SET)          = 6870895032
read(3, "playMDC_llhoft-1360848744-1.gwf\372"..., 48) = 48
lseek(3, 6870895040, SEEK_SET)          = 6870895040
read(3, "llhoft-1360848744-1.gwf\372\3\325?\372\3\332\0\1"..., 135) = 135
lseek(3, 14546618984, SEEK_SET)         = 14546618984
read(3, ".000000_stop0_1258796126.500000_"..., 48) = 48
lseek(3, 14546618992, SEEK_SET)         = 14546618992

where fd=3 is the database file,

[root@zfs1 ~]# lsof -p 2890596
COMMAND     PID USER   FD   TYPE DEVICE    SIZE/OFF     NODE NAME
duc     2890596 root  cwd    DIR  0,284          21  8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2
duc     2890596 root  rtd    DIR  253,0         290      128 /
duc     2890596 root  txt    REG  253,0      515368 51391141 /root/duc/duc
duc     2890596 root  mem    REG  253,0      186280 16797125 /usr/lib64/libgraphite2.so.3.0.1
duc     2890596 root  mem    REG  253,0      685392 16797128 /usr/lib64/libharfbuzz.so.0.10705.0
duc     2890596 root  mem    REG  253,0      629568 16788055 /usr/lib64/libgmp.so.10.3.2
duc     2890596 root  mem    REG  253,0      197728 16788243 /usr/lib64/libhogweed.so.4.5
duc     2890596 root  mem    REG  253,0      239360 16788245 /usr/lib64/libnettle.so.6.5
duc     2890596 root  mem    REG  253,0       78816 16788255 /usr/lib64/libtasn1.so.6.5.5
duc     2890596 root  mem    REG  253,0     1580488 16787908 /usr/lib64/libunistring.so.2.1.0
duc     2890596 root  mem    REG  253,0      123360 16787911 /usr/lib64/libidn2.so.0.3.6
duc     2890596 root  mem    REG  253,0     1246168 16787962 /usr/lib64/libp11-kit.so.0.3.0
duc     2890596 root  mem    REG  253,0       33728 17734973 /usr/lib64/libdatrie.so.1.3.2
duc     2890596 root  mem    REG  253,0       16360 16789513 /usr/lib64/libXau.so.6.0.0
duc     2890596 root  mem    REG  253,0       73008 16787901 /usr/lib64/libbz2.so.1.0.6
duc     2890596 root  mem    REG  253,0       33488 16787818 /usr/lib64/libuuid.so.1.3.0
duc     2890596 root  mem    REG  253,0      248344 17162947 /usr/lib64/libexpat.so.1.6.7
duc     2890596 root  mem    REG  253,0       19128 16787443 /usr/lib64/libdl-2.28.so
duc     2890596 root  mem    REG  253,0       95496 16797143 /usr/lib64/libpangoft2-1.0.so.0.4200.3
duc     2890596 root  mem    REG  253,0       37248 16813323 /usr/lib64/libffi.so.6.0.2
duc     2890596 root  mem    REG  253,0      464936 16787790 /usr/lib64/libpcre.so.1.2.10
duc     2890596 root  mem    REG  253,0     2051640 16805530 /usr/lib64/libgnutls.so.30.28.2
duc     2890596 root  mem    REG  253,0      115104 16797123 /usr/lib64/libfribidi.so.0.4.0
duc     2890596 root  mem    REG  253,0       44320 17734975 /usr/lib64/libthai.so.0.3.0
duc     2890596 root  mem    REG  253,0       42744 16787651 /usr/lib64/librt-2.28.so
duc     2890596 root  mem    REG  253,0       99656 16787738 /usr/lib64/libz.so.1.2.11
duc     2890596 root  mem    REG  253,0       80728 16789449 /usr/lib64/libXext.so.6.4.0
duc     2890596 root  mem    REG  253,0       45536 16789453 /usr/lib64/libXrender.so.1.3.0
duc     2890596 root  mem    REG  253,0       56848 16789541 /usr/lib64/libxcb-render.so.0.0.0
duc     2890596 root  mem    REG  253,0      170216 16789443 /usr/lib64/libxcb.so.1.1.0
duc     2890596 root  mem    REG  253,0       15904 16789549 /usr/lib64/libxcb-shm.so.0.0.0
duc     2890596 root  mem    REG  253,0      220992 16788516 /usr/lib64/libpng16.so.16.34.0
duc     2890596 root  mem    REG  253,0      783112 16788517 /usr/lib64/libfreetype.so.6.16.1
duc     2890596 root  mem    REG  253,0      289800 16794342 /usr/lib64/libfontconfig.so.1.12.0
duc     2890596 root  mem    REG  253,0      695320 16813370 /usr/lib64/libpixman-1.so.0.38.4
duc     2890596 root  mem    REG  253,0       99664 17205035 /usr/lib64/libgcc_s-8-20210514.so.1
duc     2890596 root  mem    REG  253,0      149976 16787455 /usr/lib64/libpthread-2.28.so
duc     2890596 root  mem    REG  253,0     1661448 16787779 /usr/lib64/libstdc++.so.6.0.25
duc     2890596 root  mem    REG  253,0     2089936 16787440 /usr/lib64/libc-2.28.so
duc     2890596 root  mem    REG  253,0     1598848 16787445 /usr/lib64/libm-2.28.so
duc     2890596 root  mem    REG  253,0      187552 16787199 /usr/lib64/libtinfo.so.6.1
duc     2890596 root  mem    REG  253,0      259192 16787191 /usr/lib64/libncursesw.so.6.1
duc     2890596 root  mem    REG  253,0     1344056 16786745 /usr/lib64/libX11.so.6.3.0
duc     2890596 root  mem    REG  253,0       62232 16797141 /usr/lib64/libpangocairo-1.0.so.0.4200.3
duc     2890596 root  mem    REG  253,0     1171912 17243211 /usr/lib64/libglib-2.0.so.0.5600.4
duc     2890596 root  mem    REG  253,0      347416 17243215 /usr/lib64/libgobject-2.0.so.0.5600.4
duc     2890596 root  mem    REG  253,0      297816 16797139 /usr/lib64/libpango-1.0.so.0.4200.3
duc     2890596 root  mem    REG  253,0     1202552 17734955 /usr/lib64/libcairo.so.2.11512.0
duc     2890596 root  mem    REG  253,0     1930824 16792173 /usr/lib64/libtkrzw.so.1.70.0
duc     2890596 root  mem    REG  253,0     1062416 16786784 /usr/lib64/ld-2.28.so
duc     2890596 root    0u   CHR  136,3         0t0        6 /dev/pts/3
duc     2890596 root    1u   CHR  136,3         0t0        6 /dev/pts/3
duc     2890596 root    2u   CHR  136,3         0t0        6 /dev/pts/3
duc     2890596 root    3u   REG   0,46 34382856040      772 /home2/duc.new
duc     2890596 root    4r   DIR   0,46         324       34 /home2
duc     2890596 root    5r   DIR  0,284         428       34 /home2/michael.coughlin
duc     2890596 root    6r   DIR  0,284          16      898 /home2/michael.coughlin/STAMP
duc     2890596 root    7r   DIR  0,284           9  2866908 /home2/michael.coughlin/STAMP/O3
duc     2890596 root    8r   DIR  0,284           4  8444738 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618
duc     2890596 root    9r   DIR  0,284           4  8786783 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500
duc     2890596 root   10r   DIR  0,284          10  8786785 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2
duc     2890596 root   11r   DIR  0,284          53  8811306 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src
duc     2890596 root   12r   DIR  0,284           8  8807879 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc
duc     2890596 root   13r   DIR  0,284          21  8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2
duc     2890596 root   14r   DIR  0,284           8  8807911 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2/m4

Here is the stack trace before I killed it,

(gdb) where
#0  0x00007f86b43c4d45 in read () at /lib64/libc.so.6
#1  0x00007f86b3fc8d0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6
#2  0x00007f86b4005921 in std::basic_filebuf >::xsgetn(char*, long) () at /lib64/libstdc++.so.6
#3  0x00007f86b40131f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6
#4  0x00007f86b5f3646b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#5  0x00007f86b5f3657f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#6  0x00007f86b5f36665 in tkrzw::StdFile::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#7  0x00007f86b5f5cd07 in tkrzw::HashRecord::ReadBody() () at /lib64/libtkrzw.so.1
#8  0x00007f86b5f5d605 in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1
#9  0x00007f86b5f628be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1
#10 0x00007f86b5f65cc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) ()
    at /lib64/libtkrzw.so.1
#11 0x00007f86b5f663be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#12 0x00007f86b5fc344e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#13 0x00007f86b5fe59fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1
#14 0x0000000000404772 in db_put
    (db=, key=key@entry=0x7ffe9cafeed0, key_len=, val=, val_len=) at src/libduc/db-tkrzw.c:94
#15 0x000000000040653b in scanner_free (scanner=scanner@entry=0x1f35fb0) at src/libduc/index.c:593
#16 0x00000000004073b6 in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39f30) at src/libduc/index.c:518
#17 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39eb0) at src/libduc/index.c:517
#18 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f37ee0) at src/libduc/index.c:517
#19 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f37e00) at src/libduc/index.c:517
#20 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39e30) at src/libduc/index.c:517
#21 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f35290) at src/libduc/index.c:517
#22 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f340f0) at src/libduc/index.c:517
#23 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f35420) at src/libduc/index.c:517
#24 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f122d0) at src/libduc/index.c:517
#25 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f09e00) at src/libduc/index.c:517
#26 0x00000000004083ba in duc_index (req=0x1f05640, path=, flags=flags@entry=(unknown: 0)) at src/libduc/index.c:676
--Type  for more, q to quit, c to continue without paging--
#27 0x000000000040e2d8 in index_main (duc=0x1ecdde0, argc=, argv=) at src/duc/cmd-index.c:106
#28 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179

While the terminate signal appears to have been caught, the resulting db file is not of much use,

[root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
Terminated Indexed 217.1Tb in 665.1M files and 19.6M directories

real    1395m10.489s
user    125m4.960s
sys     1199m20.065s

[root@zfs1 ~]# duc/duc info -d /home2/duc.new /home2
Date       Time       Files    Dirs    Size Path
[root@zfs1 ~]# 

Note, during the same time interval duc version 1.4.5 was able to completely index the full 1.16B files in ~7.5h using tokyocabinet,

[root@zfs1 ~]# duc --version
duc version: 1.4.5
options: cairo x11 ui tokyocabinet

Indexed 1158160971 files and 39722571 directories, (570.7TB apparent, 393.4TB actual) in 7 hours, 33 minutes, and 41.38 seconds.
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

My first large run ran for several hours before hanging about halfway through after scanning 665M files,

Blech, not fun. Sorry this has been such a hassle! I did find my problem with compression and I have to say it's because I'm a moron. You just need to change line 49 of db-tkrzw.c from:

char options[] = "dbm=HashDBM,file=StdFile";

to instead be:

char options[256] = "dbm=HashDBM,file=StdFile";

because I've forgotten all my C string handling obviously. Then compression will work with duc and tkrzw on the backend. For some values of work. Obviously you've got a great case for hammering on various backends.

@.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" [-------#] Indexed 217.1Tb in 665.1M files and 19.6M directories

with no file db updates in the last 18+ hours,

@.*** ~]# stat /home2/duc.new && date File: /home2/duc.new Size: 34382856040 Blocks: 17557146 IO Block: 131072 regular file Device: 2eh/46d Inode: 772 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-05-15 21:24:01.633988170 -0700 Modify: 2024-05-15 21:24:01.632988161 -0700 Change: 2024-05-15 21:24:01.632988161 -0700 Birth: 2024-05-15 16:54:05.406875015 -0700 Thu May 16 15:58:54 PDT 2024

and the process continuing to use 100% of a cpu-core according to /bin/top,

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND 

2890596 root 20 0 150080 37940 3908 R 99.4 0.0 1316:36 duc

reading the 33GB db file a few hundred bytes at a time,

Yeah, we probably need to close and sync the DB and re-optimize it. And if it's 33gb, then maybe you really need to break it down into smaller chunks? You're really pushing the limits, which is great, but so hard for me to really help, since I don't have any filesystems nearly that size.

Can you use the command:

tkrzw_dbm_util inspect /path/to/db

and see what it says? I suspect it's needs to be rebuilt to be more optimal. And I really don't know how we should make this automatic in terms of tuning.

You might need to do:

tkrzw_dbm_util rebuild /path/to/db

as well, but not sure if duc will then be able to handle it. There are certainly ways to tune tkrzw for larger setups, I just don't know if we want to do this for all cases, or just for big setups like you have.

Possibly we do a 'df -k' and 'df -i' on the path we're doing to index and if we see large numbers there, especially in terms of file counts, then we need to bump up some of the defaults or something.

I'll have to think about how to do this...

@.*** ~]# strace -p 2890596 |& head strace: Process 2890596 attached lseek(3, 6468256512, SEEK_SET) = 6468256512 read(3, "0_609_O1_BKG_C02_R1_13052018_sla"..., 109) = 109 lseek(3, 6870895032, SEEK_SET) = 6870895032 read(3, "playMDC_llhoft-1360848744-1.gwf\372"..., 48) = 48 lseek(3, 6870895040, SEEK_SET) = 6870895040 read(3, "llhoft-1360848744-1.gwf\372\3\325?\372\3\332\0\1"..., 135) = 135 lseek(3, 14546618984, SEEK_SET) = 14546618984 read(3, ".000000_stop01258796126.500000"..., 48) = 48 lseek(3, 14546618992, SEEK_SET) = 14546618992

where fd=3 is the database file,

@.*** ~]# lsof -p 2890596 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME duc 2890596 root cwd DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2 duc 2890596 root rtd DIR 253,0 290 128 / duc 2890596 root txt REG 253,0 515368 51391141 /root/duc/duc duc 2890596 root mem REG 253,0 186280 16797125 /usr/lib64/libgraphite2.so.3.0.1 duc 2890596 root mem REG 253,0 685392 16797128 /usr/lib64/libharfbuzz.so.0.10705.0 duc 2890596 root mem REG 253,0 629568 16788055 /usr/lib64/libgmp.so.10.3.2 duc 2890596 root mem REG 253,0 197728 16788243 /usr/lib64/libhogweed.so.4.5 duc 2890596 root mem REG 253,0 239360 16788245 /usr/lib64/libnettle.so.6.5 duc 2890596 root mem REG 253,0 78816 16788255 /usr/lib64/libtasn1.so.6.5.5 duc 2890596 root mem REG 253,0 1580488 16787908 /usr/lib64/libunistring.so.2.1.0 duc 2890596 root mem REG 253,0 123360 16787911 /usr/lib64/libidn2.so.0.3.6 duc 2890596 root mem REG 253,0 1246168 16787962 /usr/lib64/libp11-kit.so.0.3.0 duc 2890596 root mem REG 253,0 33728 17734973 /usr/lib64/libdatrie.so.1.3.2 duc 2890596 root mem REG 253,0 16360 16789513 /usr/lib64/libXau.so.6.0.0 duc 2890596 root mem REG 253,0 73008 16787901 /usr/lib64/libbz2.so.1.0.6 duc 2890596 root mem REG 253,0 33488 16787818 /usr/lib64/libuuid.so.1.3.0 duc 2890596 root mem REG 253,0 248344 17162947 /usr/lib64/libexpat.so.1.6.7 duc 2890596 root mem REG 253,0 19128 16787443 /usr/lib64/libdl-2.28.so duc 2890596 root mem REG 253,0 95496 16797143 /usr/lib64/libpangoft2-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 37248 16813323 /usr/lib64/libffi.so.6.0.2 duc 2890596 root mem REG 253,0 464936 16787790 /usr/lib64/libpcre.so.1.2.10 duc 2890596 root mem REG 253,0 2051640 16805530 /usr/lib64/libgnutls.so.30.28.2 duc 2890596 root mem REG 253,0 115104 16797123 /usr/lib64/libfribidi.so.0.4.0 duc 2890596 root mem REG 253,0 44320 17734975 /usr/lib64/libthai.so.0.3.0 duc 2890596 root mem REG 253,0 42744 16787651 /usr/lib64/librt-2.28.so duc 2890596 root mem REG 253,0 99656 16787738 /usr/lib64/libz.so.1.2.11 duc 2890596 root mem REG 253,0 80728 16789449 /usr/lib64/libXext.so.6.4.0 duc 2890596 root mem REG 253,0 45536 16789453 /usr/lib64/libXrender.so.1.3.0 duc 2890596 root mem REG 253,0 56848 16789541 /usr/lib64/libxcb-render.so.0.0.0 duc 2890596 root mem REG 253,0 170216 16789443 /usr/lib64/libxcb.so.1.1.0 duc 2890596 root mem REG 253,0 15904 16789549 /usr/lib64/libxcb-shm.so.0.0.0 duc 2890596 root mem REG 253,0 220992 16788516 /usr/lib64/libpng16.so.16.34.0 duc 2890596 root mem REG 253,0 783112 16788517 /usr/lib64/libfreetype.so.6.16.1 duc 2890596 root mem REG 253,0 289800 16794342 /usr/lib64/libfontconfig.so.1.12.0 duc 2890596 root mem REG 253,0 695320 16813370 /usr/lib64/libpixman-1.so.0.38.4 duc 2890596 root mem REG 253,0 99664 17205035 /usr/lib64/libgcc_s-8-20210514.so.1 duc 2890596 root mem REG 253,0 149976 16787455 /usr/lib64/libpthread-2.28.so duc 2890596 root mem REG 253,0 1661448 16787779 /usr/lib64/libstdc++.so.6.0.25 duc 2890596 root mem REG 253,0 2089936 16787440 /usr/lib64/libc-2.28.so duc 2890596 root mem REG 253,0 1598848 16787445 /usr/lib64/libm-2.28.so duc 2890596 root mem REG 253,0 187552 16787199 /usr/lib64/libtinfo.so.6.1 duc 2890596 root mem REG 253,0 259192 16787191 /usr/lib64/libncursesw.so.6.1 duc 2890596 root mem REG 253,0 1344056 16786745 /usr/lib64/libX11.so.6.3.0 duc 2890596 root mem REG 253,0 62232 16797141 /usr/lib64/libpangocairo-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 1171912 17243211 /usr/lib64/libglib-2.0.so.0.5600.4 duc 2890596 root mem REG 253,0 347416 17243215 /usr/lib64/libgobject-2.0.so.0.5600.4 duc 2890596 root mem REG 253,0 297816 16797139 /usr/lib64/libpango-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 1202552 17734955 /usr/lib64/libcairo.so.2.11512.0 duc 2890596 root mem REG 253,0 1930824 16792173 /usr/lib64/libtkrzw.so.1.70.0 duc 2890596 root mem REG 253,0 1062416 16786784 /usr/lib64/ld-2.28.so duc 2890596 root 0u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 1u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 2u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 3u REG 0,46 34382856040 772 /home2/duc.new duc 2890596 root 4r DIR 0,46 324 34 /home2 duc 2890596 root 5r DIR 0,284 428 34 /home2/michael.coughlin duc 2890596 root 6r DIR 0,284 16 898 /home2/michael.coughlin/STAMP duc 2890596 root 7r DIR 0,284 9 2866908 /home2/michael.coughlin/STAMP/O3 duc 2890596 root 8r DIR 0,284 4 8444738 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618 duc 2890596 root 9r DIR 0,284 4 8786783 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500 duc 2890596 root 10r DIR 0,284 10 8786785 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2 duc 2890596 root 11r DIR 0,284 53 8811306 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src duc 2890596 root 12r DIR 0,284 8 8807879 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc duc 2890596 root 13r DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2 duc 2890596 root 14r DIR 0,284 8 8807911 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2/m4

Here is the stack trace before I killed it,

(gdb) where

0 0x00007f86b43c4d45 in read () at /lib64/libc.so.6

1 0x00007f86b3fc8d0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6

2 0x00007f86b4005921 in std::basic_filebuf >::xsgetn(char*, long) () at /lib64/libstdc++.so.6

3 0x00007f86b40131f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6

4 0x00007f86b5f3646b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) () at /lib64/libtkrzw.so.1

5 0x00007f86b5f3657f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1

6 0x00007f86b5f36665 in tkrzw::StdFile::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1

7 0x00007f86b5f5cd07 in tkrzw::HashRecord::ReadBody() () at /lib64/libtkrzw.so.1

8 0x00007f86b5f5d605 in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1

9 0x00007f86b5f628be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1

10 0x00007f86b5f65cc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) ()

at /lib64/libtkrzw.so.1

11 0x00007f86b5f663be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1

12 0x00007f86b5fc344e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1

13 0x00007f86b5fe59fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1

14 0x0000000000404772 in db_put

(db=, ***@***.***=0x7ffe9cafeed0, key_len=, val=, val_len=) at src/libduc/db-tkrzw.c:94

15 0x000000000040653b in scanner_free @.***=0x1f35fb0) at src/libduc/index.c:593

16 0x00000000004073b6 in scanner_scan @.***=0x1f39f30) at src/libduc/index.c:518

17 0x00000000004073ae in scanner_scan @.***=0x1f39eb0) at src/libduc/index.c:517

18 0x00000000004073ae in scanner_scan @.***=0x1f37ee0) at src/libduc/index.c:517

19 0x00000000004073ae in scanner_scan @.***=0x1f37e00) at src/libduc/index.c:517

20 0x00000000004073ae in scanner_scan @.***=0x1f39e30) at src/libduc/index.c:517

21 0x00000000004073ae in scanner_scan @.***=0x1f35290) at src/libduc/index.c:517

22 0x00000000004073ae in scanner_scan @.***=0x1f340f0) at src/libduc/index.c:517

23 0x00000000004073ae in scanner_scan @.***=0x1f35420) at src/libduc/index.c:517

24 0x00000000004073ae in scanner_scan @.***=0x1f122d0) at src/libduc/index.c:517

25 0x00000000004073ae in scanner_scan @.***=0x1f09e00) at src/libduc/index.c:517

26 0x00000000004083ba in duc_index (req=0x1f05640, path=, @.***=(unknown: 0)) at src/libduc/index.c:676

--Type for more, q to quit, c to continue without paging--

27 0x000000000040e2d8 in index_main (duc=0x1ecdde0, argc=, argv=) at src/duc/cmd-index.c:106

28 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179

While the terminate signal appears to have been caught, the resulting db file is not of much use,

@.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" Terminated Indexed 217.1Tb in 665.1M files and 19.6M directories

real 1395m10.489s user 125m4.960s sys 1199m20.065s

@. ~]# duc/duc info -d /home2/duc.new /home2 Date Time Files Dirs Size Path @. ~]#

Note, during the same time interval duc version 1.4.5 was able to completely index the full 1.16B files in ~7.5h using tokyocabinet,

Nice! How big is the DB file?

@.*** ~]# duc --version duc version: 1.4.5 options: cairo x11 ui tokyocabinet

Indexed 1158160971 files and 39722571 directories, (570.7TB apparent, 393.4TB actual) in 7 hours, 33 minutes, and 41.38 seconds.

You have a nice fast filesystem for sure. Maybe you can try building duc with 'lmdb' or 'leveldb' as the backend?

l8gravely commented 6 months ago

Stuart,

Can you try changing db-tkrzw.c to have the following? Sorry it's not a real patch to apply, this just increases the number of buckets for the initial DB. The num_buckets is just my guess...

struct db db_open(const char path_db, int flags, duc_errno e) { struct db db; int compress = 0; int writeable = 0; char options[256] = "dbm=HashDBM,file=StdFile,num_buckets=100000000";

   if (flags & DUC_OPEN_FORCE) {

Have to think about how using statvfs() would work to help tune things. Can you give me the output of 'df -i' on your big filesystem, just so I can compare it with some others I have to try and come up with a scaling factor? Because I think we need to automatically tune things for really large filesystems, not matter which backend DB we use.

John

stuartthebruce commented 6 months ago

Blech, not fun. Sorry this has been such a hassle! I did find my problem with compression and I have to say it's because I'm a moron. You just need to change line 49 of db-tkrzw.c from: char options[] = "dbm=HashDBM,file=StdFile"; to instead be: char options[256] = "dbm=HashDBM,file=StdFile"; because I've forgotten all my C string handling obviously. Then compression will work with duc and tkrzw on the backend. For some values of work. Obviously you've got a great case for hammering on various backends.

That works if I have a pre-existing db file, otherwise it segfaults in db_get without --uncompressed,

(gdb) where
#0  0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1
#1  0x000000000040462e in db_get
    (val_len=, key_len=14, key=0x411669, db=0x625c60)
    at src/libduc/db-tkrzw.c:102
#2  0x000000000040462e in db_open
    (path_db=path_db@entry=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=6, e=e@entry=0x625de8) at src/libduc/db-tkrzw.c:64
#3  0x00000000004051d1 in duc_open
    (duc=duc@entry=0x625de0, path_db=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127
#4  0x000000000040e2a1 in index_main (duc=0x625de0, argc=1, argv=0x7fffffffe2c0)
    at src/duc/cmd-index.c:94
#5  0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179
[root@zfs1 duc]# rm /root/.cache/duc/duc.db
rm: cannot remove '/root/.cache/duc/duc.db': No such file or directory

[root@zfs1 duc]# ./duc index -xvp .
Writing to database "/root/.cache/duc/duc.db"
Segmentation fault (core dumped)

[root@zfs1 duc]# ls -l /root/.cache/duc/duc.db
-rw-r--r-- 1 root root 4198400 May 17 09:23 /root/.cache/duc/duc.db

[root@zfs1 duc]# ./duc index -xvp .
Writing to database "/root/.cache/duc/duc.db"
Segmentation fault (core dumped)

[root@zfs1 duc]# ls -l /root/.cache/duc/duc.db
-rw-r--r-- 1 root root 4198400 May 17 09:23 /root/.cache/duc/duc.db

[root@zfs1 duc]# rm /root/.cache/duc/duc.db
rm: remove regular file '/root/.cache/duc/duc.db'? y

[root@zfs1 duc]# ./duc index -xvp . --uncompressed
Writing to database "/root/.cache/duc/duc.db"
Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs.

[root@zfs1 duc]# ./duc index -xvp .
Writing to database "/root/.cache/duc/duc.db"
Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs.
stuartthebruce commented 6 months ago

Note, during the same time interval duc version 1.4.5 was able to completely index the full 1.16B files in ~7.5h using tokyocabinet,

Nice! How big is the DB file?

11.7GB

@.*** ~]# duc --version duc version: 1.4.5 options: cairo x11 ui tokyocabinet Indexed 1158160971 files and 39722571 directories, (570.7TB apparent, 393.4TB actual) in 7 hours, 33 minutes, and 41.38 seconds.

You have a nice fast filesystem for sure

In aggregate, I scan a set of ~6B home directory files with tokyocabinet every night spread over 5 large ZFS servers. Note, currently the largest individual nightly tokyocabinet file is 15.9GB to index 1.9B files (66.7M directories).

Maybe you can try building duc with 'lmdb' or 'leveldb' as the backend?

I would first like to see how far we can get with tkrzw

stuartthebruce commented 6 months ago

Stuart, Can you try changing db-tkrzw.c to have the following? Sorry it's not a real patch to apply, this just increases the number of buckets for the initial DB. The num_buckets is just my guess... struct db db_open(const char path_db, int flags, duc_errno e) { struct db db; int compress = 0; int writeable = 0; char options[256] = "dbm=HashDBM,file=StdFile,num_buckets=100000000"; if (flags & DUC_OPEN_FORCE) {

Running (without compression for now)

Have to think about how using statvfs() would work to help tune things. Can you give me the output of 'df -i' on your big filesystem, just so I can compare it with some others I have to try and come up with a scaling factor?

This file server splits its 1.2B files across a few hundred separate ZFS filesystems in the same zpool with 1.2B dnodes. Currently duc scans the entire collection into one DB file,

[root@zfs1 ~]# df -i | awk '$1 ~ /^home2/ {sum+=$3;cnt++}END{printf "sum=%\047d sum=%d\n", sum, cnt}'
sum=1,153,967,511 sum=322
l8gravely commented 6 months ago

Guys,

I've pushed some updates to the tkrzw branch to fix some problems and to try and auto-scale things according to the size of the filesystem. Can you run your tests with this?

You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed over if it's too CPU intensive.

But ideally if you run with:

duc index -v -d /path/to/db /godawful/large

it will hopefully A) report it's using a big, biggest or biggest setting, and B) run with compression properly now too.

Let me know how it goes.

John

stuartthebruce commented 6 months ago

Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and auto-scale things according to the size of the filesystem. Can you run your tests with this? You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed over if it's too CPU intensive. But ideally if you run with: duc index -v -d /path/to/db /godawful/large it will hopefully A) report it's using a big, biggest or biggest setting, and B) run with compression properly now too. Let me know how it goes. John

I still get a SEGV without disabling compression when trying to index just the duc code itself,

[root@zfs1 duc]# git branch
  master
* tkrzw

[root@zfs1 duc]# grep db_open src/libduc/db-tkrzw.c
struct db *db_open(const char *path_db, int flags, duc_errno *e)

[root@zfs1 duc]# ./duc index -v -d /tmp/duc.tst .
Writing to database "/tmp/duc.tst"
Segmentation fault (core dumped)

[root@zfs1 duc]# /bin/rm /tmp/duc.tst

[root@zfs1 duc]# gdb --args ./duc index -v -d /tmp/duc.tst .
GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
    .

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./duc...done.
(gdb) run
Starting program: /root/duc/duc index -v -d /tmp/duc.tst .
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Writing to database "/tmp/duc.tst"

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b61849 in tkrzw_dbm_get () from /lib64/libtkrzw.so.1
Missing separate debuginfos, use: yum debuginfo-install bzip2-libs-1.0.6-26.el8.x86_64 cairo-1.15.12-6.el8.x86_64 expat-2.2.5-11.el8_9.1.x86_64 fontconfig-2.13.1-4.el8.x86_64 freetype-2.9.1-9.el8.x86_64 fribidi-1.0.4-9.el8.x86_64 glib2-2.56.4-161.el8.x86_64 glibc-2.28-236.el8_9.13.x86_64 gmp-6.1.2-10.el8.x86_64 gnutls-3.6.16-8.el8_9.3.x86_64 harfbuzz-1.7.5-3.el8.x86_64 libX11-1.6.8-6.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libXrender-0.9.10-7.el8.x86_64 libffi-3.1-24.el8.x86_64 libgcc-8.5.0-20.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 libstdc++-8.5.0-20.el8.x86_64 libtasn1-4.13-4.el8_7.x86_64 libthai-0.1.27-2.el8.x86_64 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-44.el8_9.1.x86_64 libxcb-1.13.1-1.el8.x86_64 ncurses-libs-6.1-10.20180224.el8.x86_64 nettle-3.4.1-7.el8.x86_64 p11-kit-0.23.22-1.el8.x86_64 pango-1.42.4-8.el8.x86_64 pcre-8.42-6.el8.x86_64 pixman-0.38.4-3.el8_9.x86_64 tkrzw-libs-1.0.27-1.el8.x86_64 zlib-1.2.11-25.el8.x86_64
(gdb) where
#0  0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1
#1  0x0000000000404699 in db_get
    (val_len=, key_len=14, key=0x411849, db=0x625ba0)
    at src/libduc/db-tkrzw.c:122
#2  0x0000000000404699 in db_open
    (path_db=path_db@entry=0x625c60 "/tmp/duc.tst", flags=flags@entry=6, e=e@entry=0x625de8)
    at src/libduc/db-tkrzw.c:84
#3  0x00000000004052d1 in duc_open
    (duc=duc@entry=0x625de0, path_db=0x625c60 "/tmp/duc.tst", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127
#4  0x000000000040e469 in index_main (duc=0x625de0, argc=, argv=0x7fffffffe2b0)
    at src/duc/cmd-index.c:119
#5  0x0000000000403a46 in main (argc=, argv=) at src/duc/main.c:179
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and auto-scale things according to the size of the filesystem. Can you run your tests with this? You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed over if it's too CPU intensive. But ideally if you run with: duc index -v -d /path/to/db /godawful/large it will hopefully A) report it's using a big, biggest or biggest setting, and B) run with compression properly now too. Let me know how it goes. John

I still get a SEGV without disabling compression when trying to index just the duc code itself,

@.** duc]# grep db_open src/libduc/db-tkrzw.c struct db db_open(const char path_db, int flags, duc_errno e)

@.*** duc]# ./duc index -v -d /tmp/duc.tst . Writing to database "/tmp/duc.tst" Segmentation fault (core dumped)

Can you try with a completely blank database please? Something that doesn't exist at all?

I'll try to run some tests on mixing compressed vs non-compressed DBs, but I probably need to fix the error handling for when the DB gets opened to handle cases like this.

Should I just

  1. if opening compressed or non-compressed fails, try the opposite way? And warn?

  2. I should probably fail more gracefully. Need to double check errors better for sure.

@.*** duc]# /bin/rm /tmp/duc.tst

@.*** duc]# gdb --args ./duc index -v -d /tmp/duc.tst . GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: .

For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./duc

After you get the 'gdb' prompt and run the code and it faults, giving 'bt' command to get a back trace might be helpful. But I need to sit down and spend some time checking errors more closely when opening the DB for sure, that was just hacked in without much thought honestly.

John

stuartthebruce commented 6 months ago

Can you try with a completely blank database please? Something that doesn't exist at all?

The above test did that by running /bin/rm first.

After you get the 'gdb' prompt and run the code and it faults, giving 'bt' command to get a back trace might be helpful.

That is what the where command shows

stuartthebruce commented 6 months ago

Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and auto-scale things according to the size of the filesystem. Can you run your tests with this? You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed over if it's too CPU intensive.

Running with the latest updates and compression disabled still gets "stuck" after a few hours reading a 33GB DB file at less than 100 bytes per read(),

[root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed                     
Writing to database "/home2/duc.new"
Error statting .nfs0000000001e1d672000d0cab: No such file or directory
[-------#] Indexed 215.7Tb in 664.5M files and 19.2M directories  

-rw-r--r-- 1 root root 33G May 17 21:40 /home2/duc.new
[root@zfs1 ~]# ls -lh /home2/duc.new && date

-rw-r--r-- 1 root root 33G May 17 21:40 /home2/duc.new
Sat May 18 09:06:25 PDT 2024

[root@zfs1 ~]# strace -p 2283864 |& head
strace: Process 2283864 attached
lseek(3, 15627686648, SEEK_SET)         = 15627686648
read(3, "ctor_0.000000_rho_8.002681_job_1"..., 48) = 48
lseek(3, 15627686656, SEEK_SET)         = 15627686656
read(3, "00000_rho_8.002681_job_156_lag_6"..., 94) = 94
lseek(3, 15627686648, SEEK_SET)         = 15627686648
read(3, "ctor_0.000000_rho_8.002681_job_1"..., 48) = 48
lseek(3, 15627686656, SEEK_SET)         = 15627686656
read(3, "00000_rho_8.002681_job_156_lag_6"..., 94) = 94
lseek(3, 15627686648, SEEK_SET)         = 15627686648

(gdb) where
#0  0x00007f5a865ebd45 in read () at /lib64/libc.so.6
#1  0x00007f5a861efd0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6
#2  0x00007f5a8622c921 in std::basic_filebuf >::xsgetn(char*, long) ()
    at /lib64/libstdc++.so.6
#3  0x00007f5a8623a1f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6
#4  0x00007f5a8815d46b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) ()
    at /lib64/libtkrzw.so.1
#5  0x00007f5a8815d57f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) ()
    at /lib64/libtkrzw.so.1
#6  0x00007f5a8815d665 in tkrzw::StdFile::Read(long, void*, unsigned long) ()
    at /lib64/libtkrzw.so.1
#7  0x00007f5a88183ead in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1
#8  0x00007f5a881898be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1
#9  0x00007f5a8818ccc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1
#10 0x00007f5a8818d3be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#11 0x00007f5a881ea44e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#12 0x00007f5a8820c9fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1
#13 0x0000000000404892 in db_put
    (db=, key=key@entry=0x7ffed5d1ad50, key_len=, val=,--Type  for more, q to quit, c to continue without paging--
 val_len=) at src/libduc/db-tkrzw.c:114
#14 0x000000000040665b in scanner_free (scanner=scanner@entry=0x1c20dd0) at src/libduc/index.c:593
#15 0x00000000004074d6 in scanner_scan (scanner_dir=scanner_dir@entry=0x1baeee0)
    at src/libduc/index.c:518
#16 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bb0e10)
    at src/libduc/index.c:517
#17 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bacc50)
    at src/libduc/index.c:517
#18 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bd35a0)
    at src/libduc/index.c:517
#19 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bac420)
    at src/libduc/index.c:517
#20 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bab0f0)
    at src/libduc/index.c:517
#21 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bacfb0)
    at src/libduc/index.c:517
#22 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1b892d0)
    at src/libduc/index.c:517
#23 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1b80e00)
    at src/libduc/index.c:517
#24 0x00000000004084da in duc_index
    (req=0x1b7c640, path=, flags=flags@entry=(unknown: 0)) at src/libduc/index.c:676
#25 0x000000000040e4a8 in index_main (duc=0x1b44de0, argc=, argv=)
    at src/duc/cmd-index.c:131
#26 0x0000000000403a46 in main (argc=, argv=) at src/duc/main.c:179

And

[root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.new
APPLICATION_ERROR: Unknown DBM implementation: new

Rename duc.new to duc.tkh

[root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh 
Inspection:
  class=HashDBM
  healthy=false
  auto_restored=false
  path=/home2/duc.tkh
  cyclic_magic=2
  pkg_major_version=1
  pkg_minor_version=0
  static_flags=1
  offset_width=4
  align_pow=3
  closure_flags=0
  num_buckets=1048583
  num_records=0
  eff_data_size=0
  file_size=34363698568
  timestamp=1715992536.232470
  db_type=0
  max_file_size=34359738368
  record_base=4198400
  update_mode=in-place
  record_crc_mode=none
  record_comp_mode=none
Actual File Size: 34363698568
Number of Records: 0
Healthy: false
Should be Rebuilt: true
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and
auto-scale things according to the size of the filesystem. Can you run your tests with this?
You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed
over if it's too CPU intensive.

Running with the latest updates and compression disabled still gets "stuck" after a few hours reading a 33GB DB file at less than 100 bytes per read(),

So I think the problem here is that the DB is out of wack, and I don't detect it properly in the code, so I've just pushed a new change to build the DB with a bigger offset width, so it can handle DB files upto almost 1tb now. I hope this will fix it.

I suspect this might be the issue from reading this page:

https://dbmx.net/tkrzw/#tips

and closely reading the section on tuning HashDBM. It might also turn out that we need to use DirectIO or other tricks, since you are really pushing the size of things, which is awesome!

@.*** ~]# tkrzw_dbm_util inspect /home2/duc.new APPLICATION_ERROR: Unknown DBM implementation: new

Rename duc.new to duc.tkh

No need, you can just do:

tkrzw_dbm_util inspect --dbm hash /home2/duc.db

to force the type. Why it doesn't just read the header on the disk to discover the format I don't know.

@.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=false auto_restored=false path=/home2/duc.tkh cyclic_magic=2 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=4

This is the change I made, the offset_width is now 5, which should handle nice large DB files.

align_pow=3 closure_flags=0 num_buckets=1048583 num_records=0 eff_data_size=0 file_size=34363698568 timestamp=1715992536.232470 db_type=0 max_file_size=34359738368 record_base=4198400 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 34363698568 Number of Records: 0 Healthy: false Should be Rebuilt: true

It might be possible to fix this with:

tkrzw_dbm_util rebuild --dbm hash /home/duc.db

on the borked version of the DB, which will take another 32gb of space since it will copy things to a new file. Not sure it's worth testing, but it might be interesting to see if it does anything.

Thanks again for all your testing!

stuartthebruce commented 6 months ago

So I think the problem here is that the DB is out of wack, and I don't detect it properly in the code, so I've just pushed a new change to build the DB with a bigger offset width, so it can handle DB files upto almost 1tb now. I hope this will fix it. I suspect this might be the issue from reading this page: https://dbmx.net/tkrzw/#tips and closely reading the section on tuning HashDBM. It might also turn out that we need to use DirectIO or other tricks, since you are really pushing the size of things, which is awesome!

I have just recompiled and started another large scan test.

Note, the simple in-tree duc scan with compression still fails

[root@zfs1 duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db .
Writing to database "/tmp/duc.db"
Segmentation fault (core dumped)

It might be possible to fix this with: tkrzw_dbm_util rebuild --dbm hash /home/duc.db on the borked version of the DB, which will take another 32gb of space since it will copy things to a new file. Not sure it's worth testing, but it might be interesting to see if it does anything.

I deleted the last large scan that I aborted, but if the current one gets "stuck" and I have to kill It I will try this rebuild command.

Thanks again for all your testing!

No problem. This tool has been extremely helpful in keeping track of all the "interesting" things the users of my filesystems do, https://xkcd.com/2582.

Thank you for continuing to support it.

stuartthebruce commented 6 months ago

FYI, my first large tkrzw index completed successfully,

[root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.tkh --uncompressed
Writing to database "/home2/duc.tkh"
Indexed 1162502265 files and 39739363 directories, (570.6TB apparent, 393.3TB actual) in 9 hours, 29
minutes, and 4.56 seconds.

real    569m4.630s
user    13m43.765s
sys     551m13.239s

[root@zfs1 ~]# ls -lh /home2/duc.tkh
-rw-r--r-- 1 root root 54G May 19 03:22 /home2/duc.tkh

[root@zfs1 ~]# duc/duc info -d /home2/duc.tkh 
Date       Time       Files    Dirs    Size Path
2024-05-18 17:53:17    1.2G   39.7M  393.3T /home2

[root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh
Inspection:
  class=HashDBM
  healthy=true
  auto_restored=false
  path=/home2/duc.tkh
  cyclic_magic=3
  pkg_major_version=1
  pkg_minor_version=0
  static_flags=1
  offset_width=5
  align_pow=3
  closure_flags=1
  num_buckets=1048583
  num_records=39739366
  eff_data_size=56514265202
  file_size=57036515920
  timestamp=1716114141.965382
  db_type=0
  max_file_size=8796093022208
  record_base=5246976
  update_mode=in-place
  record_crc_mode=none
  record_comp_mode=none
Actual File Size: 57036515920
Number of Records: 39739366
Healthy: true
Should be Rebuilt: true

And it appears to continue to work after rebuilding,

[root@zfs1 ~]# time tkrzw_dbm_util rebuild --dbm hash /home2/duc.tkh
Old Number of Records: 39739366
Old File Size: 57036515920
Old Effective Data Size: 56514265202
Old Number of Buckets: 1048583
Optimizing the database: ... ok (elapsed=786.757621)
New Number of Records: 39739366
New File Size: 57428666960
New Effective Data Size: 56514265202
New Number of Buckets: 79478743

real    13m6.765s
user    1m20.338s
sys     11m7.772s

[root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh
Inspection:
  class=HashDBM
  healthy=true
  auto_restored=false
  path=/home2/duc.tkh
  cyclic_magic=7
  pkg_major_version=1
  pkg_minor_version=0
  static_flags=1
  offset_width=5
  align_pow=3
  closure_flags=1
  num_buckets=79478743
  num_records=39739366
  eff_data_size=56514265202
  file_size=57428666960
  timestamp=1716217654.727336
  db_type=0
  max_file_size=8796093022208
  record_base=397398016
  update_mode=in-place
  record_crc_mode=none
  record_comp_mode=none
Actual File Size: 57428666960
Number of Records: 39739366
Healthy: true
Should be Rebuilt: false

Note, the runtime is to be compared to the following tokyocabinet run that generated a 12GB file,

Writing to database "/dev/shm/duc/ducdb/filesystem/zfshome2.duc"
Indexed 1163411199 files and 39739981 directories, (570.8TB apparent, 393.4TB actual) in 8 hours, 7 minutes, and 39.22 seconds.

And the larger tkrzw file compresses down to 15GB with filesystem compression (ZFS zstd),

[root@zfs1 ~]# du -h /home2/duc.tkh 
15G /home2/duc.tkh
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

FYI, my first large tkrzw index completed successfully,

Sweet! So tweaking the offset_width was a good thing to do. I was thinking more about this last night and wondering if there was a way to make duc check everything once in a while and rebuild the tkrzw DB.

@.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.tkh --uncompressed Writing to database "/home2/duc.tkh" Indexed 1162502265 files and 39739363 directories, (570.6TB apparent, 393.3TB actual) in 9 hours, 29 minutes, and 4.56 seconds.

real 569m4.630s user 13m43.765s sys 551m13.239s

Man, that takes along time to run. I was playing with a test system at $WORK to try and find a good place to run my tests on against some large systems, but it didn't get very far since my systems are old, and I have limited hardware with 10g links right now to make the performance better. I'll keep poking at it here.

@.*** ~]# ls -lh /home2/duc.tkh -rw-r--r-- 1 root root 54G May 19 03:22 /home2/duc.tkh

Also. you shouldn't need to bother using the .tkh file extension at all, I force the filetype on creation.

@.*** ~]# duc/duc info -d /home2/duc.tkh Date Time Files Dirs Size Path 2024-05-18 17:53:17 1.2G 39.7M 393.3T /home2

That's a crap load of data!

@.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/home2/duc.tkh cyclic_magic=3 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=5 align_pow=3 closure_flags=1 num_buckets=1048583 num_records=39739366 eff_data_size=56514265202 file_size=57036515920 timestamp=1716114141.965382 db_type=0 max_file_size=8796093022208 record_base=5246976 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 57036515920 Number of Records: 39739366 Healthy: true Should be Rebuilt: true

And it appears to continue to work after rebuilding,

@.*** ~]# time tkrzw_dbm_util rebuild --dbm hash /home2/duc.tkh Old Number of Records: 39739366 Old File Size: 57036515920 Old Effective Data Size: 56514265202 Old Number of Buckets: 1048583 Optimizing the database: ... ok (elapsed=786.757621) New Number of Records: 39739366 New File Size: 57428666960 New Effective Data Size: 56514265202 New Number of Buckets: 79478743

real 13m6.765s user 1m20.338s sys 11m7.772s

@.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/home2/duc.tkh cyclic_magic=7 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=5 align_pow=3 closure_flags=1 num_buckets=79478743 num_records=39739366 eff_data_size=56514265202 file_size=57428666960 timestamp=1716217654.727336 db_type=0 max_file_size=8796093022208 record_base=397398016 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 57428666960 Number of Records: 39739366 Healthy: true Should be Rebuilt: false

Interestingly enough, my attempt to use larger buckets for larger filesystems didn't work. I'm going to make some changes and push them up for you to test so I can find out what I'm doing wrong. Or maybe I'll just have to run a dedicated C program instead as a test.

Note, the runtime is to be compared to the following tokyocabinet run that generated a 12GB file,

Writing to database "/dev/shm/duc/ducdb/filesystem/zfshome2.duc" Indexed 1163411199 files and 39739981 directories, (570.8TB apparent, 393.4TB actual) in 8 hours, 7 minutes, and 39.22 seconds.

And the larger tkrzw file compresses down to 15GB with filesystem compression (ZFS zstd),

I have to say, compression should be working now too. I wonder what's going wrong here. Can you give me the output of 'ldd duc' after building wtih tkrzw?

And can you check if you have the lz4 library installed on the system? It shouldn't build if you don't have it... but something funky is going on.

Can you get the output of:

$ tkrzw_build_util config
PACKAGE_VERSION: 1.0.29
LIBRARY_VERSION: 1.72.0
OS_NAME: Linux
IS_BIG_ENDIAN: 0
PAGE_SIZE: 4096
TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16
COMPRESSORS: lz4, zstd, zlib, lzma
PROCESS_ID: 1041434
MEMORY: total=131156028000 free=2537424000 cached=104927216000 rss=4480000
prefix: /usr/local
includedir: /usr/local/include
libdir: /usr/local/lib
bindir: /usr/local/bin
libexecdir: /usr/local/libexec
appinc: -I/usr/local/include
applibs: -L/usr/local/lib -ltkrzw -llzma -llz4 -lzstd -lz -lstdc++ -lrt -latomic -lpthread -lm -lc

so we can compare it to my setup on my debian 12 box? I built tkrzw from source and installed into /usr/local/bin as a test.

@.*** ~]# du -h /home2/duc.tkh 15G /home2/duc.tkh

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>

stuartthebruce commented 6 months ago

I have to say, compression should be working now too. I wonder what's going wrong here. Can you give me the output of 'ldd duc' after building wtih tkrzw?

[root@zfs1 ~]# ldd duc/duc
    linux-vdso.so.1 (0x00007ffd1c17c000)
    libtkrzw.so.1 => /lib64/libtkrzw.so.1 (0x00007f9f3a980000)
    libcairo.so.2 => /lib64/libcairo.so.2 (0x00007f9f3a660000)
    libpango-1.0.so.0 => /lib64/libpango-1.0.so.0 (0x00007f9f3a418000)
    libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 (0x00007f9f3a1c5000)
    libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f9f39eab000)
    libpangocairo-1.0.so.0 => /lib64/libpangocairo-1.0.so.0 (0x00007f9f39c9c000)
    libX11.so.6 => /lib64/libX11.so.6 (0x00007f9f39958000)
    libncursesw.so.6 => /lib64/libncursesw.so.6 (0x00007f9f3971a000)
    libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f9f394ed000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f9f3916b000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f9f38da6000)
    libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9f38a11000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9f387f1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f9f3ad53000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9f385d9000)
    libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f9f38331000)
    libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f9f380ec000)
    libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f9f37e30000)
    libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f9f37bfb000)
    libxcb-shm.so.0 => /lib64/libxcb-shm.so.0 (0x00007f9f379f7000)
    libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f9f377ce000)
    libxcb-render.so.0 => /lib64/libxcb-render.so.0 (0x00007f9f375c0000)
    libXrender.so.1 => /lib64/libXrender.so.1 (0x00007f9f373b5000)
    libXext.so.6 => /lib64/libXext.so.6 (0x00007f9f371a2000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f9f36f8a000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f9f36d82000)
    libthai.so.0 => /lib64/libthai.so.0 (0x00007f9f36b78000)
    libfribidi.so.0 => /lib64/libfribidi.so.0 (0x00007f9f3695c000)
    libgnutls.so.30 => /lib64/libgnutls.so.30 (0x00007f9f3656b000)
    libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f9f362fa000)
    libffi.so.6 => /lib64/libffi.so.6 (0x00007f9f360f1000)
    libpangoft2-1.0.so.0 => /lib64/libpangoft2-1.0.so.0 (0x00007f9f35eda000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f9f35cd6000)
    libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f9f35a9a000)
    libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f9f35892000)
    libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f9f35681000)
    libXau.so.6 => /lib64/libXau.so.6 (0x00007f9f3547d000)
    libdatrie.so.1 => /lib64/libdatrie.so.1 (0x00007f9f35275000)
    libp11-kit.so.0 => /lib64/libp11-kit.so.0 (0x00007f9f34f4b000)
    libidn2.so.0 => /lib64/libidn2.so.0 (0x00007f9f34d2d000)
    libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f9f349ac000)
    libtasn1.so.6 => /lib64/libtasn1.so.6 (0x00007f9f34799000)
    libnettle.so.6 => /lib64/libnettle.so.6 (0x00007f9f3455f000)
    libhogweed.so.4 => /lib64/libhogweed.so.4 (0x00007f9f3432f000)
    libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f9f34097000)
    libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007f9f33df2000)
    libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007f9f33bc6000)

And can you check if you have the lz4 library installed on the system?

[root@zfs1 ~]# rpm -q lz4-libs
lz4-libs-1.8.3-3.el8_4.x86_64

Note, I have found zstd to be very efficient and effective.

It shouldn't build if you don't have it... but something funky is going on. Can you get the output of: $ tkrzw_build_util config

[root@zfs1 ~]# tkrzw_build_util config PACKAGE_VERSION: 1.0.27 LIBRARY_VERSION: 1.70.0 OS_NAME: Linux IS_BIG_ENDIAN: 0 PAGE_SIZE: 4096 TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16 PROCESS_ID: 3081805 MEMORY: total=1055819504000 free=81360808000 cached=745144000 rss=2524000 prefix: /usr includedir: /usr/include libdir: /usr/lib64 bindir: /usr/bin libexecdir: /usr/libexec appinc: -I/usr/include applibs: -L/usr/lib64 -ltkrzw -lstdc++ -lrt -lpthread -lm -lc

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

I have to say, compression should be working now too. I wonder what's going wrong here. Can
you give me the output of 'ldd duc' after building wtih tkrzw?

@.*** ~]# ldd duc/duc linux-vdso.so.1 (0x00007ffd1c17c000) libtkrzw.so.1 => /lib64/libtkrzw.so.1 (0x00007f9f3a980000) libcairo.so.2 => /lib64/libcairo.so.2 (0x00007f9f3a660000) libpango-1.0.so.0 => /lib64/libpango-1.0.so.0 (0x00007f9f3a418000) libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 (0x00007f9f3a1c5000) libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f9f39eab000) libpangocairo-1.0.so.0 => /lib64/libpangocairo-1.0.so.0 (0x00007f9f39c9c000) libX11.so.6 => /lib64/libX11.so.6 (0x00007f9f39958000) libncursesw.so.6 => /lib64/libncursesw.so.6 (0x00007f9f3971a000) libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f9f394ed000) libm.so.6 => /lib64/libm.so.6 (0x00007f9f3916b000) libc.so.6 => /lib64/libc.so.6 (0x00007f9f38da6000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9f38a11000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9f387f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f9f3ad53000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9f385d9000) libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f9f38331000) libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f9f380ec000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f9f37e30000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f9f37bfb000) libxcb-shm.so.0 => /lib64/libxcb-shm.so.0 (0x00007f9f379f7000) libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f9f377ce000) libxcb-render.so.0 => /lib64/libxcb-render.so.0 (0x00007f9f375c0000) libXrender.so.1 => /lib64/libXrender.so.1 (0x00007f9f373b5000) libXext.so.6 => /lib64/libXext.so.6 (0x00007f9f371a2000) libz.so.1 => /lib64/libz.so.1 (0x00007f9f36f8a000) librt.so.1 => /lib64/librt.so.1 (0x00007f9f36d82000) libthai.so.0 => /lib64/libthai.so.0 (0x00007f9f36b78000) libfribidi.so.0 => /lib64/libfribidi.so.0 (0x00007f9f3695c000) libgnutls.so.30 => /lib64/libgnutls.so.30 (0x00007f9f3656b000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f9f362fa000) libffi.so.6 => /lib64/libffi.so.6 (0x00007f9f360f1000) libpangoft2-1.0.so.0 => /lib64/libpangoft2-1.0.so.0 (0x00007f9f35eda000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f9f35cd6000) libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f9f35a9a000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f9f35892000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f9f35681000) libXau.so.6 => /lib64/libXau.so.6 (0x00007f9f3547d000) libdatrie.so.1 => /lib64/libdatrie.so.1 (0x00007f9f35275000) libp11-kit.so.0 => /lib64/libp11-kit.so.0 (0x00007f9f34f4b000) libidn2.so.0 => /lib64/libidn2.so.0 (0x00007f9f34d2d000) libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f9f349ac000) libtasn1.so.6 => /lib64/libtasn1.so.6 (0x00007f9f34799000) libnettle.so.6 => /lib64/libnettle.so.6 (0x00007f9f3455f000) libhogweed.so.4 => /lib64/libhogweed.so.4 (0x00007f9f3432f000) libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f9f34097000) libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007f9f33df2000) libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007f9f33bc6000)

And can you check if you have the lz4 library installed on the system?

@.*** ~]# rpm -q lz4-libs lz4-libs-1.8.3-3.el8_4.x86_64

Note, I have found zstd to be very efficient and effective.

I'll look into that as an option as well and run some tests here. It would be nice (maybe) to have the abilityto tell duc which compression to use, but that's another change to do down the line here.

It shouldn't build if you don't have it... but something funky is going on. Can you get the
output of: $ tkrzw_build_util config

@.** ~]# tkrzw_build_util config PACKAGE_VERSION: 1.0.27 LIBRARY_VERSION: 1.70.0 OS_NAME: Linux IS_BIG_ENDIAN: 0 PAGE_SIZE: 4096 TYPES: void=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16 PROCESS_ID: 3081805 MEMORY: total=1055819504000 free=81360808000 cached=745144000 rss=2524000 prefix: /usr includedir: /usr/include libdir: /usr/lib64 bindir: /usr/bin libexecdir: /usr/libexec appinc: -I/usr/include applibs: -L/usr/lib64 -ltkrzw -lstdc++ -lrt -lpthread -lm -lc

I think this is a the problem, you don't have support for those libraries compiled into your setup. Hmm... now the question is how to make this work reliably?

Here's my output:

$ tkrzw_build_util config PACKAGE_VERSION: 1.0.29 LIBRARY_VERSION: 1.72.0 OS_NAME: Linux IS_BIG_ENDIAN: 0 PAGE_SIZE: 4096 TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16 COMPRESSORS: lz4, zstd, zlib, lzma PROCESS_ID: 1052258 MEMORY: total=131156028000 free=2438516000 cached=104863736000 rss=4608000 prefix: /usr/local includedir: /usr/local/include libdir: /usr/local/lib bindir: /usr/local/bin libexecdir: /usr/local/libexec appinc: -I/usr/local/include applibs: -L/usr/local/lib -ltkrzw -llzma -llz4 -lzstd -lz -lstdc++ -lrt -latomic -lpthread -lm -lc

I seem to recall you said you were RHEL8, right? But that AlmaLinux 8.x works, even though it's supposed to be the same? Time to try and install RHEL8 at home if I can for testing. I only have OracleLinux 8.x available easily so this might take some time.

l8gravely commented 6 months ago

So I spun up a Rocky Linux 8.x VM today, pulled down duc and tkrzw and compiled them, putting tkrzw into /usr/local on install, the default.

I was then able to build and test duc 1.5.0 with compression without a problem. So I suspect the issue is that:

  1. you didn't build tkrzw with any compression libraries

  2. we need to be better in checking for this on opening the library, since it obviously will break things.

So now that I have a test box, I'll see what I can do here.

I've also got a RockyLinux 9.x system setup as well and I'll try to do some testing there.

I'm getting quite the build farm these days! :-)

l8gravely commented 6 months ago

Stuart, I've pushed some updates to the tkrzw branch.

  1. I fixed the crash when you don't have compression libs enabled on the backend tkrzw library.

  2. put in some more debugging and auto-tuning of tkrzw backend, more to be done.

If you want, once you have tkrzw re-compiled with all the compression libraries, you can try tweaking the "record_comp_mode" in src/libduc/db-tkrzw.c to use one of the following options:

RECORD_COMP_LZ4 (current default) RECORD_COMP_ZLIB RECORD_COMP_ZSTD RECORD_COMP_LZMA RECORD_COMP_RC4 RECORD_COMP_AES

though I suspect the last two aren't worth while implementing. I'm starting to think about how I can support mixing different compression types into duc setup.

Probably 'duc index -C ...' would be the option, with various supported types listed. Any thoughts?

We've always pushed flexibility, but sometimes it's a pain.

I'm also not sure how portable tkrzw DBs are between systems, but since most systems are now linux... I'm not sure it's as big a deal any more. But I can see how it might be nice to have a central display system (like I do at $WORK) and multiple data collection systems closer to where the filesystems are located at various sites.

John

stuartthebruce commented 6 months ago

Stuart, I've pushed some updates to the tkrzw branch.

This is failing to compile for me on Rocky Linux 8.9

gcc -DHAVE_CONFIG_H -I.    -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid  -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi  -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/harfbuzz -Isrc/libduc -Isrc/libduc-graph -Isrc/glad -g -O2 -MT src/libduc/db-tkrzw.o -MD -MP -MF $depbase.Tpo -c -o src/libduc/db-tkrzw.o src/libduc/db-tkrzw.c &&\
mv -f $depbase.Tpo $depbase.Po
src/libduc/db-tkrzw.c: In function ‘tkrzwdb_to_errno’:
src/libduc/db-tkrzw.c:34:7: error: ‘TKRZW_STATUS_INVALID_ARGUEMENT_ERROR’ undeclared (first use in this function); did you mean ‘TKRZW_STATUS_INVALID_ARGUMENT_ERROR’?
  case TKRZW_STATUS_INVALID_ARGUEMENT_ERROR: return DUC_E_NOT_IMPLEMENTED;
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       TKRZW_STATUS_INVALID_ARGUMENT_ERROR
src/libduc/db-tkrzw.c:34:7: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [Makefile:639: src/libduc/db-tkrzw.o] Error 1
make[1]: Leaving directory '/root/duc'
make: *** [Makefile:401: all] Error 2

[root@zfs1 duc]# gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

Stuart, I've pushed some updates to the tkrzw branch.

This is failing to compile for me on Rocky Linux 8.9

Duh... I thought I had done a test-compile, but obviously not. It's fixed now. Please try again.

gcc -DHAVE_CONFIG_H -I. -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/harfbuzz -Isrc/libduc -Isrc/libduc-graph -Isrc/glad -g -O2 -MT src/libduc/db-tkrzw.o -MD -MP -MF $depbase.Tpo -c -o src/libduc/db-tkrzw.o src/libduc/db-tkrzw.c &&\ mv -f $depbase.Tpo $depbase.Po src/libduc/db-tkrzw.c: In function ‘tkrzwdb_to_errno’: src/libduc/db-tkrzw.c:34:7: error: ‘TKRZW_STATUS_INVALID_ARGUEMENT_ERROR’ undeclared (first use in this function); did you mean ‘TKRZW_STATUS_INVALID_ARGUMENT_ERROR’? case TKRZW_STATUS_INVALID_ARGUEMENT_ERROR: return DUC_E_NOT_IMPLEMENTED; ^~~~~~~~ TKRZW_STATUS_INVALID_ARGUMENT_ERROR src/libduc/db-tkrzw.c:34:7: note: each undeclared identifier is reported only once for each function it appears in make[1]: [Makefile:639: src/libduc/db-tkrzw.o] Error 1 make[1]: Leaving directory '/root/duc' make: [Makefile:401: all] Error 2

@.*** duc]# gcc --version gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.*Message ID: @.*** com>

stuartthebruce commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes: Stuart, I've pushed some updates to the tkrzw branch. This is failing to compile for me on Rocky Linux 8.9 Duh... I thought I had done a test-compile, but obviously not. It's fixed now. Please try again.

It compiles now, and generates an error for compression (rather than segfault),

[root@zfs1 duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db .
Writing to database "/tmp/duc.db"
  tkrzw_get_last_status() = unsupported compression
Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw
Unknown error, contact the author

Note, the error message referring to Tokyo Cabinet is a bit odd.

My next step is to get it compiled with compression enabled and re-run a large index.

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

                "stuartthebruce" == stuartthebruce @.***> writes: Stuart, I've pushed some
                updates to the tkrzw branch. This is failing to compile for me on Rocky Linux
                8.9
                Duh... I thought I had done a test-compile, but obviously not. It's fixed now.
                Please try again.

It compiles now, and generates an error for compression (rather than segfault),

@.*** duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db . Writing to database "/tmp/duc.db" tkrzw_get_last_status() = unsupported compression Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw Unknown error, contact the author

My next step is to get it compiled with compression enabled and re-run a large index.

Awesome! Thanks for all your testing.

And on a side note, I've got the initial support for histogram support (just text output so far) in the cmd info.

duc info -H

will show a histogram in addition to the regular info. Needs work, but it's a start.

l8gravely commented 6 months ago

"John" == John Stoffel @.***> writes:

"stuartthebruce" == stuartthebruce @.***> writes:

"stuartthebruce" == stuartthebruce @.***> writes: Stuart, I've pushed some updates to the tkrzw branch. This is failing to compile for me on Rocky Linux 8.9 Duh... I thought I had done a test-compile, but obviously not. It's fixed now. Please try again.

It compiles now, and generates an error for compression (rather than segfault),

@.*** duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db . Writing to database "/tmp/duc.db" tkrzw_get_last_status() = unsupported compression Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw Unknown error, contact the author

My next step is to get it compiled with compression enabled and re-run a large index.

Awesome! Thanks for all your testing.

And on a side note, I've got the initial support for histogram support (just text output so far) in the cmd info.

duc info -H

will show a histogram in addition to the regular info. Needs work, but it's a start.

Oh yeah, it's under the 'histogram' branch on github, and it depends on the tkrzw stuff since it's built on top of that branch right now. I'm hoping to maybe push a new release in a week or so and call it v1.5.0a as a test.

John

stuartthebruce commented 6 months ago

I seem to recall you said you were RHEL8, right?

Rocky Linux 8.9, but I am also fail to see a COMPRESSORS line in the output of tkrzw_build_util config on Rocky Linux 9.4. I have tried both the pre-packaged EPEL packages and building tkrzw locally with ./configure --enable-most-features.

But that AlmaLinux 8.x works, even though it's supposed to be the same?

Are you using the EPEL 8 package tkrzw-1.0.27-1.el8.x86_64 or something else?

Time to try and install RHEL8 at home if I can for testing. I only have OracleLinux 8.x available easily so this might take some time.

There should be no need to install RHEL8. If you send me the package or configure/build steps you are using on OracleLinux 8, I should be able to reproduce that on RL8.

Note, this will presumably end up in a request to the EPEL 8 package maintainer to update their build to enable compression.

l8gravely commented 6 months ago

"stuartthebruce" == stuartthebruce @.***> writes:

I seem to recall you said you were RHEL8, right?  Rocky Linux

8.9, but I am also fail to see a COMPRESSORS line in the output of tkrzw_build_util config on Rocky Linux 9.4. I have tried both the pre-packaged EPEL packages and building tkrzw locally with ./configure --enable-most-features.

I suspect you need to actually install the correct -devel parts for the various compression libraries. On RockyLinux 9 I have the following installed. You won't need all of them, and I'll have to update the docs for Rocky8 and Rocky9 to do installs.

***@***.*** ~]$ rpm -qa |grep devel | sort
brotli-devel-1.0.9-6.el9.x86_64
bzip2-devel-1.0.8-8.el9.x86_64
cairo-devel-1.17.4-7.el9.x86_64
fontconfig-devel-2.14.0-2.el9_1.x86_64
freetype-devel-2.10.4-9.el9.x86_64
fribidi-devel-1.0.10-6.el9.2.x86_64
glib2-devel-2.68.4-14.el9.x86_64
glibc-devel-2.34-100.el9.x86_64
graphite2-devel-1.3.14-9.el9.x86_64
harfbuzz-devel-2.7.4-10.el9.x86_64
libblkid-devel-2.37.4-18.el9.x86_64
libdatrie-devel-0.2.13-4.el9.x86_64
libffi-devel-3.4.2-8.el9.x86_64
libicu-devel-67.1-9.el9.x86_64
libmount-devel-2.37.4-18.el9.x86_64
libpng-devel-1.6.37-12.el9.x86_64
libselinux-devel-3.6-1.el9.x86_64
libsepol-devel-3.6-1.el9.x86_64
libstdc++-devel-11.4.1-3.el9.x86_64
libthai-devel-0.1.28-8.el9.x86_64
libX11-devel-1.7.0-9.el9.x86_64
libXau-devel-1.0.9-8.el9.x86_64
libxcb-devel-1.13.1-9.el9.x86_64
libxcrypt-devel-4.4.18-3.el9.x86_64
libXext-devel-1.3.4-8.el9.x86_64
libXft-devel-2.3.3-8.el9.x86_64
libxml2-devel-2.9.13-6.el9_4.x86_64
libXrender-devel-0.9.10-16.el9.x86_64
libzstd-devel-1.5.1-2.el9.x86_64
lmdb-devel-0.9.29-3.el9.x86_64
lz4-devel-1.9.3-5.el9.x86_64
ncurses-devel-6.2-10.20210508.el9.x86_64
pango-devel-1.48.7-3.el9.x86_64
pcre2-devel-10.40-5.el9.x86_64
pcre-devel-8.44-3.el9.3.x86_64
pixman-devel-0.40.0-6.el9_3.x86_64
sysprof-capture-devel-3.40.1-3.el9.x86_64
tokyocabinet-devel-1.4.48-19.el9.x86_64
xorg-x11-proto-devel-2022.2-1.el9.noarch
xz-devel-5.2.5-8.el9_0.x86_64
zlib-devel-1.2.11-40.el9.x86_64
But that AlmaLinux 8.x works, even though it's supposed to be the same?

Are you using the EPEL 8 package tkrzw-1.0.27-1.el8.x86_64 or something else?

Time to try and install RHEL8 at home if I can for testing. I only have OracleLinux 8.x
available easily so this might take some time.

There should be no need to install RHEL8. If you send me the package or configure/build steps you are using on OracleLinux 8, I should be able to reproduce that on RL8.

I ended up installing rocky linux 8 and 9 at home, it was simple.

Note, this will presumably end up in a request to the EPEL 8 package maintainer to update their build to enable compression.

Sweet! It would be nice if they actually supported more compression schemes. And I'll see if I can tweak duc to support more of them by default, and report errors when they're not found.

On RockyLinux 8 I have the following -devel packages installed:

***@***.*** ~]$ rpm -qa | grep -- -devel | sort
bzip2-devel-1.0.6-26.el8.x86_64
cairo-devel-1.15.12-6.el8.x86_64
elfutils-debuginfod-client-devel-0.189-3.el8.x86_64
elfutils-devel-0.189-3.el8.x86_64
elfutils-libelf-devel-0.189-3.el8.x86_64
expat-devel-2.2.5-11.el8_9.1.x86_64
fontconfig-devel-2.13.1-4.el8.x86_64
freetype-devel-2.9.1-9.el8.x86_64
fribidi-devel-1.0.4-9.el8.x86_64
gettext-common-devel-0.19.8.1-17.el8.noarch
gettext-devel-0.19.8.1-17.el8.x86_64
glib2-devel-2.56.4-161.el8.x86_64
glibc-devel-2.28-236.el8_9.13.x86_64
graphite2-devel-1.3.10-10.el8.x86_64
harfbuzz-devel-1.7.5-3.el8.x86_64
kernel-devel-4.18.0-513.24.1.el8_9.x86_64
keyutils-libs-devel-1.5.10-9.el8.x86_64
krb5-devel-1.18.2-26.el8.x86_64
libcom_err-devel-1.45.6-5.el8.x86_64
libicu-devel-60.3-2.el8_1.x86_64
libpng-devel-1.6.34-5.el8.x86_64
libselinux-devel-2.9-8.el8.x86_64
libsepol-devel-2.9-3.el8.x86_64
libstdc++-devel-8.5.0-20.el8.x86_64
libuuid-devel-2.32.1-44.el8_9.1.x86_64
libverto-devel-0.3.2-2.el8.x86_64
libX11-devel-1.6.8-6.el8.x86_64
libXau-devel-1.0.9-3.el8.x86_64
libxcb-devel-1.13.1-1.el8.x86_64
libxcrypt-devel-4.1.1-6.el8.x86_64
libXext-devel-1.3.4-1.el8.x86_64
libXft-devel-2.3.3-1.el8.x86_64
libXrender-devel-0.9.10-7.el8.x86_64
libzstd-devel-1.4.4-1.el8.x86_64
lz4-devel-1.8.3-3.el8_4.x86_64
ncurses-devel-6.1-10.20180224.el8.x86_64
openssl-devel-1.1.1k-12.el8_9.x86_64
pango-devel-1.42.4-8.el8.x86_64
pcre2-devel-10.32-3.el8_6.x86_64
pcre-devel-8.42-6.el8.x86_64
pixman-devel-0.38.4-3.el8_9.x86_64
systemtap-devel-4.9-3.el8.x86_64
valgrind-devel-3.21.0-8.el8.x86_64
xorg-x11-proto-devel-2020.1-3.el8.noarch
xz-devel-5.2.4-4.el8_6.x86_64
zlib-devel-1.2.11-25.el8.x86_64
stuartthebruce commented 6 months ago

I was able to build a local copy of tkrzw with compression enabled and link duc against that, so I have filed an RFE bugreport against Fedora requesting that the EPEL 8/9 builds enable compression: https://bugzilla.redhat.com/show_bug.cgi?id=2283237

For bonus points, it would be great if duc was also packaged/distributed via EPEL.

And please consider asking the tkrzw maintainer to upload a version (with compression enabled) to condor-forge so that https://anaconda.org/conda-forge/duc will make this readily accessible to conda users.