Open kousu opened 3 years ago
duke has a huge amount of storage attached, so. We've been promised that the storage on the git server can be expanded as needed but for now it is 1TB, so we need to do some reconnaissance.
I'm starting here:
$ df -h
Filesystem Size Used Avail Use% Mounted on
[...]
//132.207.65.200/histology 8.9T 7.4T 1.5T 84% /home/GRAMES.POLYMTL.CA/me/duke/histology
//132.207.65.200/mri 8.9T 7.4T 1.5T 84% /home/GRAMES.POLYMTL.CA/me/duke/mri
//132.207.65.200/projects 4.4T 4.3T 68G 99% /home/GRAMES.POLYMTL.CA/me/duke/projects
//132.207.65.200/public 4.4T 4.3T 68G 99% /home/GRAMES.POLYMTL.CA/me/duke/public
//132.207.65.200/sct_testing 4.4T 4.3T 68G 99% /home/GRAMES.POLYMTL.CA/me/duke/sct_testing
//132.207.65.200/temp 4.4T 4.3T 68G 99% /home/GRAMES.POLYMTL.CA/me/duke/temp
Okay so it looks like the CIFS mounts are shared from two disks: a 4.4T disk and a 8.9T disk, and that we've used 12T in all. But I've gotta think that most of that is junk.
I'm starting here by locating duplicate files in one of the shares:
$ time fdupes -r -H /home/GRAMES.POLYMTL.CA/me/duke/projects 2>&1 | tee ~/duke-projects-duplicates.txt
[TO BE FILLED IN WHEN IT FINISHES]
^ first attempt reset halfway through. it's a lot of data to process. trying again now.
We need to migrate datasets off smb://duke.neuro.polymtl.ca and onto git+ssh://data.neuro.polymtl.ca.
I imagine both will live on for a while, but we want to prefer the git server to:
a. save space by using branching instead of duplicating entire datasets b. have provenance tracking
To do this we need to (I think):