Closed cschlick closed 7 years ago
Which build version? Let me know output of 'dupd version'.
Try it out on a few of the test files first as a starting point. Run the following in the dupd source directory:
cd tests/files4 dupd scan dupd rmsh --link
Paste the output here. The script should have a few rm commands each followed by a corresponding ln -s.
[cschlick@hm013 files4]$ dupd scan
Files: 0 0 errors 0 ms
Round 1: 1 groups of duplicates confirmed 65 ms
Round 2: 0 groups of duplicates confirmed 0 ms
Round 3: 2 groups of duplicates confirmed 80 ms
Total duplicates: 6 files in 3 groups in 996 ms
Run 'dupd report' to list duplicates.
Note: This is a development version of dupd (1.5-dev) (bfeec3e2545dd97d86e5d29d00fa4323a6e32781)
May contain known bugs or unstable work in progress!
If stability is desired, use a release version of dupd.
[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
This appears to be an issue in the 1.4 release as well....
[cschlick@hm013 files4]$ dupd version
1.4
[cschlick@hm013 files4]$ dupd scan
Files scanned: 18 (0ms)
Done processing 6 sets
Duplicate processing completed in 28ms
Total duplicates: 6
Run 'dupd report' to list duplicates.
[cschlick@hm013 files4]$ dupd report
Duplicate report from database /N/u/cschlick/Karst/.dupd_sqlite:
508 total bytes used by duplicates:
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2
70306 total bytes used by duplicates:
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3
70354 total bytes used by duplicates:
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC
Total used: 141168 bytes (137 KiB, 0 MiB, 0 GiB)
[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
Interesting. Would you mind attaching the database ($HOME/.dupd_sqlite) created when you run 'dupd scan' on the src/dupd/tests/files4 directory?
Ok, I compiled it on my personal machine and now it seems to work fine. Before I was running it on a shared cluster, which is where I need to use it. Here is the db file from the cluster (renamed .txt because of github):
And here is the one from my personal machine that works fine
Using your cluster db file on my machine produces expected output, below. We can conclude the scan was fine and the db content is correct. So the problem should be on the script generation code but only on the binary compiled on your cluster.
Do you get same problem with the --hardlink option?
What OS/CPU/hardware is the cluster where you compiled it?
Is it an unmodified dupd source or did you change anything either in code or the Makefile?
% dupd rmsh --db dupd_sqlite_cluster.sqlite --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
It's unmodified source, so I'm not sure what could be wrong. The --hardlink flag doesn't do anything either. I tried copying over a binary from my personal machine, but it results in an error (/lib64/libc.so.6: version `GLIBC_2.14' not found). Here are the two systems. Maybe later I can try compiling a different version of gcc on the cluster. Anyway, thanks for taking a look.
Cluster: Kernel 2.6.32-696.3.2.el6.x86_64 Make 3.81 GCC 4.9.4 ldd (GNU libc) 2.12 OpenSSL 1.0.1e-fips SQLite version 3.6.20
Personal: Kernel 3.10.0-514.21.2.el7.x86_64 Make 3.82 GCC 4.8.5 ldd (GNU libc) 2.17 OpenSSL 1.0.2k SQLite version 3.13.0
Strange. Later tonight I can add some diagnostics to a different branch to see if that sheds some light.
Try the code in rmsh_test branch and paste the output of dupd rmsh --link with it.
Ok, I compiled both dupd-rmsh-test and the master and now they are both working perfectly on both machines. Not sure why I was having problems, but it must have been on my end. Thanks for the help and thanks for the software.
Master:
[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
#rmsh-test
[cschlick@hm013 files4]$dupd-rmsh_test/dupd rmsh --link
# opt_link: 1
# opt_hardlink: (null)
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
# rmsh_link: 1
#
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
Weird. Let me know if it happens again...
Hi, I am trying to use the "dupd rmsh --link" option to create symlinks for duplicates. However I see nothing about creating links in the resulting shell script. Only a lot or rm commands. Perhaps I am not using it correctly?
Thanks