replikation / What_the_Phage

WtP: Phage identification via nextflow and docker or singularity
https://mult1fractal.github.io/wtp-documentation/
GNU General Public License v3.0
100 stars 16 forks source link

Where is docker setup writing files to? #136

Closed Neato-Nick closed 2 years ago

Neato-Nick commented 3 years ago

I have a small SSD and a large HDD that I use for storage. I keep all my databases on the large HDD so I'm attempting to download there. But while downloading databases, filespace is getting eaten up in $HOME on my SSD at a much faster rate than where I'm downloading the databases to on HDD (/data/databases). I have added all the parameters I saw on the wiki to specify download locations, even --cachedir which I think only is applicable for singularity.

cd /data/databases
nextflow run replikation/What_the_Phage --databases /data/labwork/ref_proteomes/nextflow-autodownload-databases -r v1.0.2 -profile local,docker --cores 4 --setup --workdir ./wtp_setup_work --cachedir ./wtp_setup_cache --output ./wtp_setup_results

I had to restart/resume the setup a few times after clearing other files, but the setup keeps consuming more and more space. During my most recent attempt run, it consumed 7 Gb in $HOME while adding just 1 Gb to /data. Interestingly, when the setup process succesfully finished, 7G were added onto /data really quickly with no new storage used in $HOME.

# setup started
$ date
Wed 30 Jun 2021 12:30:39 PM PDT
$ df -h $HOME
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  468G  416G   30G  94% /
$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       1.7T  1.2T  429G  73% /data
$ 
$ date
Wed 30 Jun 2021 12:36:37 PM PDT
$ df -h $HOME
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  468G  422G   23G  95% /
$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       1.7T  1.2T  428G  73% /data
$ 
$ date
Wed 30 Jun 2021 12:42:01 PM PDT
$ df -h $HOME
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  468G  422G   23G  95% /
$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       1.7T  1.2T  424G  74% /data
$ 
# setup succesfully finished
$ date
Wed 30 Jun 2021 12:44:22 PM PDT
$ df -h $HOME
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p2  468G  422G   23G  95% /data
$ df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       1.7T  1.2T  417G  74% /data

Maybe something is being written invisibly to my $HOME and copied to the workdir without removing the source? I'm new to Docker, so I wouldn't be surprised if this is a config issue, but I checked /var and the docker files look small. I double-checked that there weren't files being written to /tmp/nextflow-*$USER, and it looks like --workdir is correctly preventing that. What else could that filespace be?

replikation commented 3 years ago

Hi

so for docker (they are stored in you "bootdisk" where var is located i think):

these images and sizes can be inspected via docker images and you can remove them individually via docker rmi <imagename> For a complete cleanup (removes all docker images / containers etc), do:

docker stop $(docker ps -a -q) # stop all running containers
docker rm $(docker ps -a -q) # remove all containers
docker rmi $(docker images -f "dangling=true" -q)   # remove all unused image slices
docker rmi -f $(docker images -a -q) # remove all images

regarding disk space:

every file that is produced by WtP is stored here:

work/   # temporary stuff can be removed after a run 
databases/ # this is where the databases are put or alternatively provide a path to them
results/ # the output-dir from --output

hope this helps best

Neato-Nick commented 3 years ago

Great! After the databases are downloaded, can we download the --workdir from the setup command? For me that ended up being 31G.

docker images is really helpful. Can the sum of container space required be added to the wiki? Totals: 31G workdir, 41G databases dir, plus the sum of these containers, I think they were all added during setup, with the exception of maybe dfam/tetools?

$ docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.ID}}\t{{.CreatedAt}}\t{{.Size}}"
REPOSITORY                                    TAG                                        IMAGE ID       CREATED AT                      SIZE
papanikos/marvel                              0.2-29b3c73                                451b8b7f09d9   2021-04-19 04:47:48 -0700 PDT   6.42GB
papanikos/virsorter-2                         2.2.1--fa935f8                             52548ff35f49   2021-04-16 01:21:00 -0700 PDT   1.19GB
multifractal/seeker                           0.1                                        efe57801fbb8   2020-10-26 06:13:16 -0700 PDT   1.66GB
multifractal/phigaro                          0.5.2                                      1c86698f8bf2   2020-09-14 04:26:27 -0700 PDT   2.6GB
dfam/tetools                                  1.2                                        9aa97b75d2c3   2020-09-09 09:57:34 -0700 PDT   3.63GB
multifractal/virnet-hack                      0.1                                        f67323ac9dcc   2020-07-31 01:04:15 -0700 PDT   1.62GB
nanozoo/emboss                                6.6.0--418c521                             66fa21650fb8   2020-07-30 05:10:10 -0700 PDT   1.07GB
multifractal/ppr-meta                         0.3.1                                      a7c3728f5bb4   2020-07-30 02:19:38 -0700 PDT   5.3GB
multifractal/virfinder                        0.2                                        383a9764ebda   2020-07-30 00:53:38 -0700 PDT   3.91GB
multifractal/vibrant                          0.5                                        50a55ac7e616   2020-07-30 00:42:23 -0700 PDT   1.42GB
nanozoo/sourmash                              3.4.1--16a8db7                             5fa1c8f40842   2020-07-25 23:55:37 -0700 PDT   788MB
nanozoo/hmmer                                 3.3--3db9dd1                               b94ab6d4b970   2020-07-17 01:22:38 -0700 PDT   484MB
nanozoo/checkv                                0.6.0--e97f45e                             992e7f903edf   2020-06-02 03:19:58 -0700 PDT   1.72GB
nanozoo/altair                                4.1.0--086b80e                             2e4909a308d8   2020-05-12 06:41:31 -0700 PDT   1.02GB
nanozoo/samtools                              1.9--76b9270                               84525e422138   2020-04-03 06:57:16 -0700 PDT   487MB
multifractal/virsorter                        0.1.2                                      807f233d65a0   2020-04-03 02:23:48 -0700 PDT   2.84GB
nanozoo/r_fungi                               0.1--097b1bb                               8059f16d6755   2020-02-16 06:13:02 -0800 PST   3.11GB
nanozoo/template                              3.8--ccd0653                               4c5ca72d30b0   2020-01-25 10:29:49 -0800 PST   681MB
hello-world                                   latest                                     bf756fb1ae65   2020-01-02 17:21:37 -0800 PST   13.3kB
nanozoo/basics                                1.0--962b907                               e6db71c4b54a   2019-12-13 02:53:58 -0800 PST   79.1MB
nanozoo/upsetr                                1.4.0--0ea25b3                             903ee61f2d93   2019-11-16 15:12:47 -0800 PST   3.21GB
nanozoo/r_ggplot2                             0.1--6405f6d                               a978b0bd253e   2019-11-14 06:23:01 -0800 PST   3.26GB
multifractal/metaphinder                      0.1                                        f6878e657670   2019-09-05 03:47:54 -0700 PDT   767MB
multifractal/deepvirfinder                    0.1                                        14e271fb6b8e   2019-09-05 02:53:05 -0700 PDT   2.37GB
nanozoo/seqkit                                0.10.1--360dd6d                            19de1e6b6911   2019-08-06 07:23:11 -0700 PDT   561MB
nanozoo/prodigal                              2.6.3--2769024                             9a394ae3c748   2019-08-06 00:27:24 -0700 PDT   531MB
replikation commented 3 years ago