jvirkki / dupd

CLI utility to find duplicate files
http://www.virkki.com/dupd
GNU General Public License v3.0
112 stars 16 forks source link

Help creating links #14

Closed cschlick closed 7 years ago

cschlick commented 7 years ago

Hi, I am trying to use the "dupd rmsh --link" option to create symlinks for duplicates. However I see nothing about creating links in the resulting shell script. Only a lot or rm commands. Perhaps I am not using it correctly?

Thanks

jvirkki commented 7 years ago

Which build version? Let me know output of 'dupd version'.

Try it out on a few of the test files first as a starting point. Run the following in the dupd source directory:

cd tests/files4 dupd scan dupd rmsh --link

Paste the output here. The script should have a few rm commands each followed by a corresponding ln -s.

cschlick commented 7 years ago
[cschlick@hm013 files4]$ dupd scan
Files:        0                           0 errors                         0 ms
Round 1:        1 groups of duplicates confirmed                          65 ms
Round 2:        0 groups of duplicates confirmed                           0 ms
Round 3:        2 groups of duplicates confirmed                          80 ms
Total duplicates: 6 files in 3 groups in      996 ms
Run 'dupd report' to list duplicates.

Note: This is a development version of dupd (1.5-dev) (bfeec3e2545dd97d86e5d29d00fa4323a6e32781)
May contain known bugs or unstable work in progress!
If stability is desired, use a release version of dupd.

[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
cschlick commented 7 years ago

This appears to be an issue in the 1.4 release as well....

[cschlick@hm013 files4]$ dupd version
1.4

[cschlick@hm013 files4]$ dupd scan
Files scanned: 18 (0ms)
Done processing 6 sets                             
Duplicate processing completed in 28ms
Total duplicates: 6
Run 'dupd report' to list duplicates.
[cschlick@hm013 files4]$ dupd report
Duplicate report from database /N/u/cschlick/Karst/.dupd_sqlite:

508 total bytes used by duplicates:
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2

70306 total bytes used by duplicates:
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3

70354 total bytes used by duplicates:
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
  /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC

Total used: 141168 bytes (137 KiB, 0 MiB, 0 GiB)

[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
jvirkki commented 7 years ago

Interesting. Would you mind attaching the database ($HOME/.dupd_sqlite) created when you run 'dupd scan' on the src/dupd/tests/files4 directory?

cschlick commented 7 years ago

Ok, I compiled it on my personal machine and now it seems to work fine. Before I was running it on a shared cluster, which is where I need to use it. Here is the db file from the cluster (renamed .txt because of github):

dupd_sqlite_cluster.txt

cschlick commented 7 years ago

And here is the one from my personal machine that works fine

dupd_sqlite_personal.txt

jvirkki commented 7 years ago

Using your cluster db file on my machine produces expected output, below. We can conclude the scan was fine and the db content is correct. So the problem should be on the script generation code but only on the binary compiled on your cluster.

Do you get same problem with the --hardlink option?

What OS/CPU/hardware is the cluster where you compiled it?

Is it an unmodified dupd source or did you change anything either in code or the Makefile?

% dupd rmsh  --db dupd_sqlite_cluster.sqlite  --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
cschlick commented 7 years ago

It's unmodified source, so I'm not sure what could be wrong. The --hardlink flag doesn't do anything either. I tried copying over a binary from my personal machine, but it results in an error (/lib64/libc.so.6: version `GLIBC_2.14' not found). Here are the two systems. Maybe later I can try compiling a different version of gcc on the cluster. Anyway, thanks for taking a look.

Cluster: Kernel 2.6.32-696.3.2.el6.x86_64 Make 3.81 GCC 4.9.4 ldd (GNU libc) 2.12 OpenSSL 1.0.1e-fips SQLite version 3.6.20

Personal: Kernel 3.10.0-514.21.2.el7.x86_64 Make 3.82 GCC 4.8.5 ldd (GNU libc) 2.17 OpenSSL 1.0.2k SQLite version 3.13.0

jvirkki commented 7 years ago

Strange. Later tonight I can add some diagnostics to a different branch to see if that sheds some light.

jvirkki commented 7 years ago

Try the code in rmsh_test branch and paste the output of dupd rmsh --link with it.

cschlick commented 7 years ago

Ok, I compiled both dupd-rmsh-test and the master and now they are both working perfectly on both machines. Not sure why I was having problems, but it must have been on my end. Thanks for the help and thanks for the software.

Master:

[cschlick@hm013 files4]$ dupd rmsh --link
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
#

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"

#rmsh-test

[cschlick@hm013 files4]$dupd-rmsh_test/dupd rmsh --link
# opt_link: 1
# opt_hardlink: (null)
#
# WARNING: Auto-generated by dupd to blindly delete duplicates.
# Only one file in each duplicate set is kept and it might not
# be the one you wanted! Review carefully before running this!
# rmsh_link: 1
#

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/z2"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/1" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/3"

#
# KEEPING: /gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB
#
rm "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
ln -s "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffB" "/gpfs/home/c/s/cschlick/Karst/Software/dupd/tests/files4/three1diffC"
jvirkki commented 7 years ago

Weird. Let me know if it happens again...