lbr38 / repomanager

A web UI to mirror rpm or deb packages repositories.
GNU General Public License v3.0
50 stars 7 forks source link

Using multiple disks / mount points to split up repo data #138

Closed protek-jon closed 6 months ago

protek-jon commented 7 months ago

Hello. I attempted to split up my repo data "per-distro" as follows:

One disk is mounted to /var/lib/docker/volumes Another disk is mounted to /var/lib/docker/volumes/repomanager-repo/debian The plan being to add another disk for the next distro, e.g. /var/lib/docker/volumes/repomanager-repo/centos

This is the error I get after downloading all of the packages: "Error: Could not rename working directory /home/repo/download-mirror-debian-bookworm-main-1701319569" The downloaded files are all erased. I checked the permissions... they were consistent across the various paths.

I am running this on a Centos Stream 9 system with the below docker run command:

docker run -d --restart always --name repomanager \
        -e FQDN=(redacted) \
        -e MAX_UPLOAD_SIZE=32M \
        -p 8080:8080 \
        -v /etc/localtime:/etc/localtime:ro \
        -v /var/lib/docker/volumes/repomanager-data:/var/lib/repomanager \
        -v /var/lib/docker/volumes/repomanager-repo:/home/repo \
        lbr38/repomanager:latest

Please let me know if you have any ideas on this. Thanks for this application! It appears to be exactly what I needed to complete the task at hand, which is staging deployments.

lbr38 commented 7 months ago

Hello

I'm not sure I fully understand and I've never done this kind of split for Repomanager. I doubt it would work because Repomanager creates its repositories in /home/repo/ (within the container) and does not distinguish between Debian and CentOS. Even if you create dedicated directories 'centos' and 'debian,' it will still write to /home/repo/.

The only way to achieve such separation would be to have two Repomanager instances (2 docker containers running), one dedicated to Debian and the other to CentOS. Then allocate a separate disk to each instance.

protek-jon commented 7 months ago

Thanks @lbr38 ! It was mostly for organization and breaking up the repo data into smaller pieces.

I have another question if you could please help. When I update a snapshot, it will consume the space needed by the repo on the disk an additional time, right? I want to try backing it on BTRFS to see if I can then duplicate a snapshot and when it "copies" the repo data for the update, it only writes the new data, and the old data is just linked by the BTRFS volume, saving a LOT of space. I'm dealing with some large repos, so this could drastically make a difference for my setup.

lbr38 commented 7 months ago

I have another question if you could please help. When I update a snapshot, it will consume the space needed by the repo on the disk an additional time, right?

I confirm that's right. I have already tried to think of a way to avoid redownloading a file if it already exists on the system (using the inode principle, for example), but it requires technical skills that I do not possess. It would likely introduce a lot of complexity and risks of bugs and data loss if not handled properly, so at the moment, it is not a feature I intend to implement.

I want to try backing it on BTRFS to see if I can then duplicate a snapshot and when it "copies" the repo data for the update, it only writes the new data, and the old data is just linked by the BTRFS volume, saving a LOT of space. I'm dealing with some large repos, so this could drastically make a difference for my setup.

I would be curious to hear about your experience if you try the BTRFS volume system!

protek-jon commented 6 months ago

Turns out RepoManager works fantastic with BTRFS. I set up a nightly duperemove process that checks for files with the same hashes and frees up disk space.

Crontab:

Run remove-dupes at 3am PT / 11am UTC

0 11 * /root/remove-dupes.sh 2>&1 > /root/logs/remove-dupes-$(date +%Y-%m-%d-%H-%M).log

/root/remove-dupes.sh:

!/bin/bash

Clear logs older than 14 days

find /root/logs -type f -mtime +14 -exec rm {} \; 2>&1

Find duplicate files and mark them so that BTRFS can free up space

duperemove -rdhv --hashfile=/root/duperemove-hashes.db /var/lib/docker/volumes/repomanager-repo 2>&1

lbr38 commented 6 months ago

Good to know! Glad you find a suitable solution for your needs :)

I could add this in the documentation if you permit me to do so. I think it could be useful for other users looking for disk space optimization.

protek-jon commented 6 months ago

Please feel free! I am working with some large repos and this is helping keep the storage requirement to a reasonable size. I’m very happy with the solution.

Best regards,

Jon Packard

lbr38 commented 6 months ago

Thanks! I just added it to: https://github.com/lbr38/repomanager/wiki/12.-Miscellaneous-and-tips#disk-usage-optimization

I consider this closed.