tuna / tunasync

Mirror job management tool.
GNU General Public License v3.0
1.57k stars 272 forks source link

bug: shell file is not mapped into docker volume while using "command provider" #164

Closed r00t1900 closed 2 years ago

r00t1900 commented 2 years ago

Env

Description

tunasync can not run custom shell file with proper path:

tunasync worker -c worker.conf -v --debug:

[22-01-01 11:06:18][DEBUG][runner.go:53] volume: /tmp/tunasync/pypi:/tmp/tunasync/pypi                                        
[22-01-01 11:06:18][DEBUG][runner.go:127] Command start: [docker run --rm -a STDOUT -a STDERR --name tunasync-job-pypi -w /tmp
/tunasync/pypi -u 0:0 -v /tmp/tunasync/log/tunasync/pypi:/tmp/tunasync/log/tunasync/pypi -v /tmp/tunasync/log/tunasync/pypi/py
pi_2022-01-01_11_06.log:/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log -v /tmp/tunasync/pypi:/tmp/tunasync/pypi -e 
TUNASYNC_MIRROR_NAME=pypi -e TUNASYNC_WORKING_DIR=/tmp/tunasync/pypi -e TUNASYNC_UPSTREAM_URL=https://pypi.python.org/ -e TUNA
SYNC_LOG_DIR=/tmp/tunasync/log/tunasync/pypi -e TUNASYNC_LOG_FILE=/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log tunathu/bandersnatch:latest /home/scripts/pypi.sh]
[22-01-01 11:06:18][DEBUG][cmd_provider.go:145] set isRunning to true: pypi                                                   
[22-01-01 11:06:18][DEBUG][base_provider.go:168] calling Wait: pypi                                                          
[22-01-01 11:06:18][DEBUG][job.go:169] provider started                                                                 
[22-01-01 11:06:18][DEBUG][worker.go:469] reporting on manager url: http://localhost:12345/workers/test_worker/schedules      
[22-01-01 11:06:18][DEBUG][worker.go:448] reporting on manager url: http://localhost:12345/workers/test_worker/jobs/pypi      
[22-01-01 11:06:18][DEBUG][worker.go:469] reporting on manager url: http://localhost:12345/workers/test_worker/schedules
[22-01-01 11:06:18][DEBUG][base_provider.go:165] set isRunning to false: pypi
[22-01-01 11:06:18][DEBUG][job.go:180] syncing done
[22-01-01 11:06:18][WARNIN][job.go:213] failed syncing pypi: exit status 127
[22-01-01 11:06:18][DEBUG][job.go:215] post-fail hooks

/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log:

root@tuna-docker-supported:~# cat /tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_20.log.fail                        
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec
: "/home/scripts/pypi.sh": stat /home/scripts/pypi.sh: no such file or directory: unknown.                          
time="2022-01-01T11:20:33+08:00" level=error msg="error waiting for container: context canceled"

Analysis

According to these debug information, I noticed that the docker commands did not map pypi.sh into docker filesystem, which might be the reason of no such file or directory.

Solution

I try to append -v /home/scripts/pypi.sh:/home/scripts/pypi.sh to the docker commands and then manually execute it, and it shows that it works well:

docker run --rm -a STDOUT -a STDERR --name tunasync-job-pypi -w /tmp/tunasync/pypi -u 0:0 \
# add this below volume mapping args
-v /home/scripts/pypi.sh:/home/scripts/pypi.sh \
-v /tmp/tunasync/log/tunasync/pypi:/tmp/tunasync/log/tunasync/pypi \
-v /tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log:/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log \
-v /tmp/tunasync/pypi:/tmp/tunasync/pypi \
-e TUNASYNC_MIRROR_NAME=pypi \
-e TUNASYNC_WORKING_DIR=/tmp/tunasync/pypi \
-e TUNASYNC_UPSTREAM_URL=https://pypi.python.org/ \
-e TUNASYNC_LOG_DIR=/tmp/tunasync/log/tunasync/pypi \
-e TUNASYNC_LOG_FILE=/tmp/tunasync/log/tunasync/pypi/pypi_2022-01-01_11_06.log \
tunathu/bandersnatch:latest /home/scripts/pypi.sh

command output:

Syncing to /tmp/tunasync/pypi
2022-01-01 04:06:26,421 INFO: Selected storage backend: filesystem (configuration.py:128)
2022-01-01 04:06:26,421 INFO: Selected compare method: stat (configuration.py:174)
2022-01-01 04:06:26,740 INFO: Initialized project plugin allowlist_project, filtering ['tf-nightly-cpu'] (allowlist_name.py:31
)
2022-01-01 04:06:26,744 INFO: Initialized project plugin blocklist_project, filtering [] (blocklist_name.py:27)
2022-01-01 04:06:26,800 INFO: Status file /tmp/tunasync/pypi/status missing. Starting over. (mirror.py:601)
2022-01-01 04:06:26,800 INFO: Syncing with https://pypi.python.org/. (mirror.py:56)
2022-01-01 04:06:26,800 INFO: Current mirror serial: 0 (mirror.py:267)
2022-01-01 04:06:26,800 INFO: Syncing all packages. (mirror.py:282)
2022-01-01 04:06:43,845 INFO: Package 'tf-nightly-cpu' is allowlisted (allowlist_name.py:88)                                 
2022-01-01 04:06:43,955 INFO: Trying to reach serial: 12451048 (mirror.py:299)                                               
2022-01-01 04:06:43,955 INFO: 1 packages to sync. (mirror.py:301)                                                            
2022-01-01 04:06:43,978 INFO: No metadata filters are enabled. Skipping metadata filtering (mirror.py:75)                    
2022-01-01 04:06:43,978 INFO: No release filters are enabled. Skipping release filtering (mirror.py:77)                      
2022-01-01 04:06:43,978 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:79)            
2022-01-01 04:06:43,981 INFO: Fetching metadata for package: tf-nightly-cpu (serial 12447857) (package.py:57)                
2022-01-01 04:06:44,648 INFO: Downloading: https://files.pythonhosted.org/packages/46/2a/07af15a0d8ca3f75a53621dab60f92f72d704
6c511dbeeee303cb947b187/tf_nightly_cpu-2.7.0.dev20210701-cp36-cp36m-macosx_10_14_x86_64.whl (mirror.py:933)

Further

[docker] enable = true

[manager] api_base = "http://localhost:12345" token = "" ca_cert = ""

[cgroup] enable = false base_path = "/sys/fs/cgroup" group = "tunasync"

[server] hostname = "localhost" listen_addr = "127.0.0.1" listen_port = 6000 ssl_cert = "" ssl_key = ""

[[mirrors]] name = "pypi" provider = "command" upstream = "https://pypi.tuna.tsinghua.edu.cn/" command = "/home/scripts/pypi.sh" docker_image = "tunathu/bandersnatch:latest" interval = 5

`manger.conf`:
```nginx
debug = false

[server]
addr = "127.0.0.1"
port = 12345
ssl_cert = ""
ssl_key = ""

[files]
db_type = "bolt"
db_file = "/tmp/tunasync/manager.db"
ca_cert = ""

/home/scripts/pypi.sh:

#!/bin/bash
set -e
BANDERSNATCH=${BANDERSNATCH:-"/usr/local/bin/bandersnatch"}
TUNASYNC_UPSTREAM=${TUNASYNC_UPSTREAM_URL:-"https://pypi.tuna.tsinghua.edu.cn/"}
CONF="/tmp/bandersnatch.conf"
INIT=${INIT:-"0"}

if [ ! -d "$TUNASYNC_WORKING_DIR" ]; then
        mkdir -p $TUNASYNC_WORKING_DIR
        INIT="1"
fi

echo "Syncing to $TUNASYNC_WORKING_DIR"

if [[ $INIT == "0" ]]; then
(
        cat << EOF
[mirror]
directory = ${TUNASYNC_WORKING_DIR}
master = ${TUNASYNC_UPSTREAM}
json = true
timeout = 300
workers = 5
hash-index = false
stop-on-error = false
delete-packages = true
compare-method = stat

[plugins]
enabled =
    blocklist_project
    allowlist_project

[allowlist]
packages =
    tf-nightly-cpu
EOF
        for i in $PYPI_EXCLUDE; do
                echo "    $i"
        done
) > $CONF
        exec $BANDERSNATCH -c $CONF mirror 
else
        cat > $CONF << EOF
[mirror]
directory = ${TUNASYNC_WORKING_DIR}
master = ${TUNASYNC_UPSTREAM}
json = true
timeout = 15
workers = 10
hash-index = false
stop-on-error = false
delete-packages = false
EOF

        exec $BANDERSNATCH -c $CONF mirror
fi

Thanks for viewing.

shankerwangmiao commented 2 years ago

Your analysis is correct. It is not a bug but a feature, because tunasync does not know how to setup the mapping. Actually, the script configured in command field is executed in the docker image. It can be directly built into the image or mapped from other location. The mapping can be declared in the [docker] section so that no repeated separated config is needed. For example:

[docker]
volumes = [
        "/path/to/tunasync-scripts:/home/scripts:ro",
]

[[mirrors]]
name = "foo"
provider = "command"
upstream = "xxxxx"
command = "/home/scripts/foo.sh"
docker_image = "foo_image:latest"
docker_volumes = [
  "/path/to/additional_volume1:/path/to/mountpoint:ro",
  "/path/to/additional_volume2:/path/to/mountpoint2:ro"
]
shankerwangmiao commented 2 years ago

Bandersnatch relies on xml-rpc interface provided by official pypi.org, and as a result cannot sync pypi repository from an alternative source. However, in its latest release, a new config is added entitled download-mirror, to fetch package metadata from the rpc interface on pypi.org and actual packages from an alternative source.

r00t1900 commented 2 years ago

Bandersnatch relies on xml-rpc interface provided by official pypi.org, and as a result cannot sync pypi repository from an alternative source. However, in its latest release, a new config is added entitled download-mirror, to fetch package metadata from the rpc interface on pypi.org and actual packages from an alternative source.

Thank you for replying. This really help a lot, bravo!

r00t1900 commented 2 years ago

Your analysis is correct. It is not a bug but a feature, because tunasync does not know how to setup the mapping. Actually, the script configured in command field is executed in the docker image. It can be directly built into the image or mapped from other location. The mapping can be declared in the [docker] section so that no repeated separated config is needed. For example:

[docker]
volumes = [
        "/path/to/tunasync-scripts:/home/scripts:ro",
]

[[mirrors]]
name = "foo"
provider = "command"
upstream = "xxxxx"
command = "/home/scripts/foo.sh"
docker_image = "foo_image:latest"
docker_volumes = [
  "/path/to/additional_volume1:/path/to/mountpoint:ro",
  "/path/to/additional_volume2:/path/to/mountpoint2:ro"
]

Thank you, your mind and step are both right, problem solved :)