laur89 / docker-seedbox-rclone-fetch-extract

Dockerised service pulling data from remote seedbox & extracting archives
10 stars 2 forks source link
data-downloader downloader emby extract extract-data extractor jellyfin plex radarr rclone seedbox servarr sonarr torrent unraid unrar unzip

Seedbox data fetcher & extractor

Dockerised service periodically pulling data from a remote seedbox & extracting archived files.

Note data is synced unidirectionally, and if already downloaded & processed asset gets deleted on the remote, then it's also nuked locally. This is generally the preferred method, as *arr service (or whatever other media manager you happen to use) should be responsible for torrent removal upon successful import anyways.

It's also important to know if an already-downloaded source file is modified at the remote (apart from deletion), then those modifications will no longer be reflected locally.

Rationale

This service aims to solve a common issue with the servarr projects around data import (+ also provides extraction) described here. tl;dr of it is if *arr is monitoring a directory and expecting say full season worth of files, but by the time it goes to check/import only half of episodes have been downloaded from your remote seedbox, then only those episodes present would be imported.

We solve this by using rclone to first download assets into an intermediary directory not monitored by *arr services, optionally process them (eg extract archives), and then move them atomically to a destination directory that *arr expects them in.

servarrs' completed download handling is documented/described here; archived asset handling isn't described in much detail, but can be found here.

Configuration

Required environment variables

Optional environment variables

Required mountpoints & files

Example docker command:

docker run -d \
    --name seedbox-fetcher \
    -e REMOTE=seedbox \
    -e SRC_DIR=files/complete \
    -e DEST_INITIAL=/data/rclone-tmp \
    -e DEST_FINAL=/data/complete \
    -e PUID=1000 \
    -v /host/dir/downloads/torrents:/data \
    -v $HOME/.config/seedbox-fetcher:/config \
    layr/seedbox-rclone-fetch-extract

On syncing logic and DEPTH env var

DEPTH env var selects the depth level in relation to SRC_DIR in which files&dirs are downloaded/removed from. If any of replicated/downloaded nodes get deleted on the remote server, they will also be deleted from DEST_FINAL.

If additional file or dir gets written into an already-downloaded directory, then this addition wouldn't be downloaded, as downloaded nodes are considered finalized, meaning no changes to them are replicated, only their removal. This applies also for child removals -- ie if a child file in an already-replicated directory is removed on remote, then this removal won't be reflected in our local copy.

In other words, download/remove happens only if addition/removal is detected at given DEPTH.

Say your SRC_DIR on the remote server looks like:

$ tree SRC_DIR
SRC_DIR/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
├── dir2
│   └── file2
└── file3

DEPTH=1 (default)

If DEPTH=1, then dir1/, dir2/ & file3 would be replicated to DEST_FINAL. If any of them gets deleted on the remote server, it will also be deleted from DEST_FINAL. If additional file or dir gets written into or removed from dir1/ or dir2/, then this addition or removal wouldn't be downloaded.

Replicated copy would look like an exact copy of the remote:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
├── dir2
│   └── file2
└── file3

Now let's say file3 and file2 were removed on remote, and newfile was written into dir1/. After sync our local copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2
    └── file2

Note file3 removal was reflected in our local copy as expected. But newfile addition nor file2 removal weren't. This is because their parent directories (dir1/ and dir2/ respectively) had already been replicated, and thus are considered finalized.

DEPTH=2

If DEPTH=2, then dir12/, file1 & file2 would be replicated to DEST_FINAL while preserving the original directory structure - meaning parent directories from the SRC_DIR root will be created also on DEST_FINAL. If any of them gets deleted on the remote server, it will also be deleted from DEST_FINAL. If additional file or dir gets written into or removed from dir12/, then this addition or removal wouldn't be downloaded. Note file3 is completely ignored by the service, as it sits at depth=1 level.

Replicated copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2
    └── file2

Now let's say file121 and file2 were removed on remote, and newfile was written into dir1/dir12/. After sync our local copy would look like:

$ tree DEST_FINAL
DEST_FINAL/
├── dir1
│   ├── dir12
│   │   └── file121
│   └── file1
└── dir2

Note file2 removal was reflected in our local copy as expected. But newfile addition nor file121 removal weren't. This is because their parent directory dir1/dir12/ had already been replicated, and thus is considered finalized.

If you want empty parent directories (dir2/ in above example) to be cleaned up, then set RM_EMPTY_PARENT_DIRS env var to a non-empty value.

Debugging

Sometimes it's useful to debug rclone/config issues directly from the container shell. If you're doing so, make sure to run all commands as abc user, otherwise you may accidentally mess up some files' ownership. e.g.:

su abc -s /bin/sh -c 'rclone lsf -vvv --max-depth 1 --config /config/rclone.conf  your-remote:'
su abc -s /bin/sh -c /sync.sh

TODO