tailhook / vagga

Vagga is a containerization tool without daemons
http://vagga.readthedocs.org
MIT License
1.86k stars 96 forks source link

Implement _hardlink & _verify commands, global hardlinking #433

Closed anti-social closed 6 years ago

tailhook commented 7 years ago

Have run _hardlink --global on my system:

WARN:vagga::wrapper::hardlink: Found and linked 1462331 (43GB) identical files

.. in 27 minutes. Which freed 18Gb disk space. Which is pretty awesome.

The question is: can it report amount of space freed rather than amount of files scanned? (or both)


I'll play around with it a little bit before merging.

anti-social commented 7 years ago

in 27 minutes

I think you used debug build. I tested release build and it was quite fast.

tailhook commented 7 years ago

It's release build. 43Gb, 1.5kk files looks reasonable I think. Also, some images (I don't know how many) were indexed in the process.

anti-social commented 7 years ago

The question is: can it report amount of space freed rather than amount of files scanned? (or both)

This is because of you already have been using hardlinking. So for example if you have 3 files ino1, ino2, ino2 we create 2 hardlinks and count size twice. I will try to fix that.

anti-social commented 7 years ago

It's release build. 43Gb, 1.5kk files looks reasonable I think. Also, some images (I don't know how many) were indexed in the process.

Yeah, I tested that when we didn't check real file hash before hardlinking.

tailhook commented 7 years ago

204 seconds second time (when no files are linked), but there were errors

anti-social commented 6 years ago

Updated.

tailhook commented 6 years ago

Okay, this time it ran ~6 min in a storage dir having 358224 dirs, 3002993 files:

 WARN 2018-05-19T15:09:15Z: vagga::wrapper::hardlink: Found and linked 2345494 (55GB) identical files

Which freed almost 52 Gb of disk space. (Note: I've previously scanned for hashes of whole storage dir, so cache was presumably pretty hot, also I have only 16Gb RAM on system, so it's impossible to cache everything)

Still, I have some concern, which I'll share with you privately.

tailhook commented 6 years ago

Okay, my concern is answered I think. Merged in. Sorry for the long delay!