openzim / zimfarm

Farm operated by bots to grow and harvest new zim files
https://farm.openzim.org
GNU General Public License v3.0
82 stars 25 forks source link

Platform limits for WMF not respected (or not configured properly) #680

Closed kelson42 closed 2 years ago

kelson42 commented 2 years ago

It should not be more than 3 scrapers (actually 2 might be even better) running at the same time from the same IP to the WMF backend-cluster.

But here I see 4 of them! image

rgaudin commented 2 years ago

Workers have the ability to change the default (1) limit. That's how we allow our mwoffliner workers to run 2.

You should contact the worker owner

kevinmcmurtrie commented 2 years ago

The last two, wikipedia_sv_all and wiktionary_lt, don't exist. A UPS died during home repairs so those two jobs were lost.

kelson42 commented 2 years ago

@rgaudin Why theses tasks didn't vanished after a few hours? In case they are not running within the worker, then the scheduler should stop considering they are running IMO.

rgaudin commented 2 years ago

With a crashed server, the task-worker monitoring the scraper container is gone so there's nothing left to report a missing scraper. Upon manager (re)start, we could imagine it reporting which tasks are already running and let the scheduler mark tasks assigned to this worker yet not in list as dead.

kelson42 commented 2 years ago

Today, pixelmemory has 3 of them (and I believe a few hours ago, it was again 4). image

I keep thinking about that and I don't see why a worker should benefit to specifiy this limit. This should be centralised. Maybe there is a scenario where a worker define a limit lower than the maximum specified by the platform, but IMO it should never be higher.

rgaudin commented 2 years ago
kevinmcmurtrie commented 2 years ago

What do you want me to change?

#!/bin/bash

### MANDATORY

# Zimfarm username
ZIMFARM_USERNAME="pixelmemory"

# Zimfarm folder. You have to create it. Put your `id_rsa` private key
# directly at its root. Will be used as well for other Zimfarm
# temporary data.
ZIMFARM_ROOT=/Yosemite/zimfarm

### OPTIONAL

# Worker name (your choice, can be different from the username)
ZIMFARM_WORKER_NAME="$ZIMFARM_USERNAME"

# Set to `"y"` if you need `sudo` for `docker` command (`""` otherwise)
SUDO_DOCKER=""

# Whether to display debug-level logs (`"y"` or `""`)
ZIMFARM_DEBUG="y"

# Maximum amount of RAM you want your worker to use
ZIMFARM_MAX_RAM="50GiB"

# Disk space you are dedicating to the worker. worker needs this space avail to work
# /!\ disk usage is not enforced (might exceed this limit)
ZIMFARM_DISK="3TiB"

# Artificial number to configure the level of CPU load you want to
# allocate. Put `"3"` if you want to have around one task at a time,# `"6"` if you want to have around two task in parallel, etc.
ZIMFARM_CPU="12"
# ZIMFARM_CPU="0"

# Number of CPU/core to assign to each scraper. This is a hard limit.
# Leave it unset to allow as much CPU as available.
# Value of "1" means one core. "2" is two cores. ".5" means half a core
# ZIMFARM_TASK_CPUS="1"
ZIMFARM_TASK_CPUS=""

# Which CPU/core scraper containers should use (use numbers from /proc/cpuinfo)
# Leave it unset to let docker spread resources itself.
# Seems to overide cpu-quota (ZIMFARM_TASK_CPUS). Doc is unclear.
# See https://docs.docker.com/config/containers/resource_constraints/#cpu
# ZIMFARM_TASK_CPUSET="0-3"
ZIMFARM_TASK_CPUSET=""

# Comma-separated list of offliners to run or `""` for all of them. If
# you want to run `youtube` tasks, you need to be whitelisted, contact
# us.
# - zimit due to Google malware
ZIMFARM_OFFLINERS="mwoffliner,youtube,sotoki,phet,nautilus,ted,openedx,kolibri,wikihow,gutenberg"

# Set to `"y"` to only run task specifically assigned to this worker# (`""` otherwise)
ZIMFARM_SELFISH=""

# Set to `"y"` to use a public (Cloudfare, Google) DNS instead of your
# system (Internet provider) one. `""` otherwise.
USE_PUBLIC_DNS=""

DNSCACHE_IMAGE="pixelmemory/notcache:latest"

# change default maximum nb of tasks for your worker over a specific platform
# PLATFORM_wikimedia_MAX_TASKS=2
# PLATFORM_youtube_MAX_TASKS=2
# PLATFORM_wikihow_MAX_TASKS=2
#!/bin/bash

# Zimfarm worker manager script                                     
#
# README at https://github.com/openzim/zimfarm/blob/master/workers/README.md
#
# this script is provided for your comfort only.
# DONT RUN IT unless you've read it and understood its behavior.    
######## DEFAULT VALUES #                                           # change them in your zimfarm.config file
SUDO_DOCKER=
ZIMFARM_USERNAME="unknown"
ZIMFARM_WORKER_NAME="unknown"
ZIMFARM_DEBUG=
ZIMFARM_MAX_RAM="2G"                                                ZIMFARM_DISK="10G"
ZIMFARM_CPU="3"
ZIMFARM_TASK_CPUS=""                                                ZIMFARM_TASK_CPUSET=""
ZIMFARM_ROOT=/tmp
ZIMFARM_OFFLINERS=
ZIMFARM_SELFISH=                                                    USE_PUBLIC_DNS=
MANAGER_IMAGE="ghcr.io/openzim/zimfarm-worker-manager:latest"
TASK_WORKER_IMAGE=""
DNSCACHE_IMAGE=""
UPLOADER_IMAGE=""
CHECKER_IMAGE=""
MONITOR_IMAGE=""                                                    WEB_API_URIS="https://api.farm.openzim.org/v1"
POLL_INTERVAL="180"
MONITORING_DEST=""  # IP:PORT
MONITORING_KEY=""  # UUID
#########################
SOURCE_URL="https://raw.githubusercontent.com/openzim/zimfarm/master/workers/contrib/zimfarm.sh"
WORKER_MANAGER_NAME="zimfarm-manager"
SCRIPT_VERSION="1.0.0"

function die() {
    echo $1
    exit 1
}

# find this script's path
if [[ $(uname -s) == "Darwin" ]]; then
    # brew install coreutils
    parentdir=$(dirname "$(greadlink -f "$0")")
    scriptname=$(basename "$(greadlink -f "$0")")
else
    parentdir=$(dirname "$(readlink -f "$0")")
    scriptname=$(basename "$(readlink -f "$0")")
fi

# select and read config file
configfname="zimfarm.config"
search_paths=( "${parentdir}/${configfname}" "${HOME}/.${configfname}" "${HOME}/${configfname}" "/etc/${configfname}" )
function display_search_paths() {
    echo ""
    echo "Search paths:"
    for path in "${search_paths[@]}"
    do
       :
       echo "  - ${path}"
    done
}
configpath=
for path in "${search_paths[@]}"
do
   :
   if [ -f $path ] ; then
    configpath=$path
    break
   fi
done

# fail if we have no config file
if [[ "$configpath" == "" ]]; then
    echo "unable to find ${configfname} in known locations"
    display_search_paths
    die
fi

# load config variables
source $configpath || die "failed to source ${configpath}"
datadir=$ZIMFARM_ROOT/data

# display config file path
function configfile() {
    echo "Using: ${configpath}"
    display_search_paths
}

# display options list
function usage() {
    echo "Usage: $0 [help|config|ps|logs|inspect|prune|restart|stop|shutdown|update|version]"
    echo ""
    echo "  configfile      show the config file path in use"
    echo "  config          show the config file's content"
    echo ""
    echo "  restart         start or restart the manager. reloads config."
    echo "  logs <name> [n] display logs of task or 'manager' using its name"
    echo "  inspect <name>  inspect details of the 'manager' or container"
    echo "  stop <name>     stop a task or the 'manager' using its name"
    echo "  shutdown        stops the manager and all running tasks"    echo ""
    echo "  ps              list of running containers with zimfarm labels"
    echo "  prune           remove all docker containers/images/volums"
    echo "  update          display commands to update this script (apply with 'update do')"
    echo "  version         display version of this script"
    echo ""
}

# run docker commands directly or via sudo if SUDO_DOCKER is set
function run() {
    if [ ! -z $SUDO_DOCKER ]; then
        sudo "$@"
    else
        "$@"
    fi
}

function config() {
    cat $configpath
}

# display a list of running containers with some zimfarm labels
function ps() {
    run docker ps --filter label=zimfarm --format 'table {{.ID}}\t{{.Label "tid"}}\t{{.Label "schedule_name"}}\t{{.Label "task_id"}}\t{{.RunningFor}}\t{{.Names}}' $1
}

# cleanup disk usage (to be run in cron)
function prune() {
    # remove all unreferenced images and containers created by zimfarm
    run docker system prune --all --force --filter label=zimfarm
    # remove all unreferenced images and containers
    run docker system prune --all --force
}

# stop container, extending timeout so task can stop scrapers and dnscache
function stop() {
    target=$1
    if [[ "$target" == "manager" ]]; then
        target=$WORKER_MANAGER_NAME
    fi
    echo "stopping container ${target}..."
    run docker stop -t 120 $target
}

# start or restart the manager using config values
function restart() {
    tok=  -censored-
    echo $tok | docker login -u kmcmurtrie --password-stdin

    echo "(re)starting zimfarm worker manager..."
    echo ":: stopping ${WORKER_MANAGER_NAME}"
    run docker stop $WORKER_MANAGER_NAME || true
    run docker rm $WORKER_MANAGER_NAME || true

    echo ":: starting ${WORKER_MANAGER_NAME}"
    if [[ $MANAGER_IMAGE =~ ":" ]]; then
        run docker pull $MANAGER_IMAGE
    fi

    run docker run \
        --name $WORKER_MANAGER_NAME \
        --label=zimfarm \
        --restart=always \
        --detach \
        --log-driver json-file \
        --log-opt max-size="100m" \
        -v $datadir:/data \
        -v /var/run/docker-userns.sock:/var/run/docker.sock:ro \
        -v $ZIMFARM_ROOT/id_rsa:/etc/ssh/keys/zimfarm:ro \
        --env ZIMFARM_MEMORY=$ZIMFARM_MAX_RAM \
        --env ZIMFARM_DISK=$ZIMFARM_DISK \
        --env ZIMFARM_CPUS=$ZIMFARM_CPU \
        --env ZIMFARM_TASK_CPUS=$ZIMFARM_TASK_CPUS \
        --env ZIMFARM_TASK_CPUSET=$ZIMFARM_TASK_CPUSET \
        --env SELFISH=$ZIMFARM_SELFISH \
        --env USERNAME=$ZIMFARM_USERNAME \
        --env DEBUG=$ZIMFARM_DEBUG \
        --env WORKER_NAME=$ZIMFARM_WORKER_NAME \
        --env WEB_API_URIS=$WEB_API_URIS \
        --env UPLOAD_URI=$UPLOAD_URI \
        --env USE_PUBLIC_DNS=$USE_PUBLIC_DNS \
        --env OFFLINERS=$ZIMFARM_OFFLINERS \
        --env TASK_WORKER_IMAGE=$TASK_WORKER_IMAGE \
        --env PLATFORM_wikimedia_MAX_TASKS=$PLATFORM_wikimedia_MAX_TASKS \
        --env PLATFORM_youtube_MAX_TASKS=$PLATFORM_youtube_MAX_TASKS \
        --env PLATFORM_wikihow_MAX_TASKS=$PLATFORM_wikihow_MAX_TASKS \
        --env POLL_INTERVAL=$POLL_INTERVAL \
        --env DNSCACHE_IMAGE=$DNSCACHE_IMAGE \
        --env UPLOADER_IMAGE=$UPLOADER_IMAGE \
        --env CHECKER_IMAGE=$CHECKER_IMAGE \
        --env MONITOR_IMAGE=$MONITOR_IMAGE \
        --env MONITORING_DEST=$MONITORING_DEST \
        --env MONITORING_KEY=$MONITORING_KEY \
    $MANAGER_IMAGE worker-manager
}

# stop the manager and all the workers
function shutdown() {
    echo "shutting down manager and all the workers..."
    run docker kill -s SIGQUIT $WORKER_MANAGER_NAME
}

# display logs of a container or the manager, using --tail and -f
function logs() {
    target=$1
    tail=$2
    if [[ "$target" == "manager" ]]; then
        target=$WORKER_MANAGER_NAME
    fi
    if [[ "${tail}" == "" ]]; then
        tail="100"
    fi
    run docker logs --tail $tail -f $target
}

# display details of a container or the manager
function inspect() {
    target=$1
    if [[ "$target" == "manager" ]]; then
        target=$WORKER_MANAGER_NAME
    fi
    run docker inspect $2 $target
}

# display the command needed to update this script from the repo
# add 'do' parameter to attempt to run it
function update() {
    echo "updating $1..."
    dest="${parentdir}/${scriptname}"
    update_cmd="sudo wget -O ${dest} ${SOURCE_URL} && sudo chmod +x ${dest}"
    if [[ "$2" == "do" ]]; then
        bash -c "${update_cmd}"
    else
        echo $update_cmd
    fi
}

function usage_if_missing() {
    if [ -z $1 ]; then
        usage
        exit 1
    fi
}

# script entrypoint
function main() {
    action=$1
    target=$2

    case $action in
      "configfile")
        configfile
        ;;

      "config")
        config
        ;;

      "ps")
        # optionnal: pass params to ps (-a, -nX)
        ps $target $3
        ;;

      "prune")
        prune
        ;;

      "restart")
        restart
        ;;

      "start")
        restart
        ;;

      "stop")
        usage_if_missing $target
        stop $target
        ;;

      "logs")
        usage_if_missing $target
        logs $target $3
        ;;

      "inspect")
        usage_if_missing $target
        inspect $target $3
        ;;

      "shutdown")
        shutdown
        ;;

      "update")
        update $0 $2
        ;;

      "version")
        echo "version ${SCRIPT_VERSION}"
        exit 0
        ;;

      *)
        usage $0
        ;;
    esac

}

main "$@"
kevinmcmurtrie commented 2 years ago

I do adjust ZIMFARM_CPU so it may sometimes be a bit different than the actual running. I set it to zero to drain for updates. I turn it down on hot days when the heat is undesirable or on stormy days to reduce battery drain if the power goes out.

rgaudin commented 2 years ago

Here's another occurence of this:

Screen Shot 2022-05-11 at 08 21 21
kiwix@athena18:~$ zimfarm ps
CONTAINER ID   tid       schedule name       task id                    CREATED          NAMES
269df247d6ec   8fb6f     wikihow_zh_maxi     8fb6fcb7f57cd99235b6b726   12 minutes ago   zimscraper_wikihow_8fb6f
20f0a39fc742   8fb6f     wikihow_zh_maxi     8fb6fcb7f57cd99235b6b726   12 minutes ago   dnscache_8fb6f
007d7470b3eb   8fb6f     wikihow_zh_maxi     8fb6fcb7f57cd99235b6b726   12 minutes ago   zimtask_8fb6f
d5dd03b7ceb3   34974     wikipedia_tr_top    349740b3497c980e69a39726   2 hours ago      zimscraper_mwoffliner_34974
d84eee3af0eb   34974     wikipedia_tr_top    349740b3497c980e69a39726   2 hours ago      dnscache_34974
67493b19ace5   34974     wikipedia_tr_top    349740b3497c980e69a39726   2 hours ago      zimtask_34974
127cba3ce511   204bb     wikipedia_min_all   204bbcc4bd832a7976019726   7 hours ago      zimscraper_mwoffliner_204bb
59af92002a47   204bb     wikipedia_min_all   204bbcc4bd832a7976019726   7 hours ago      dnscache_204bb
12a9fdf066f2   204bb     wikipedia_min_all   204bbcc4bd832a7976019726   7 hours ago      zimtask_204bb
707d7d36f1ce   3fa93     wikipedia_ur_all    3fa9399f8049163bd44f8726   8 hours ago      zimscraper_mwoffliner_3fa93
328302f2288f   3fa93     wikipedia_ur_all    3fa9399f8049163bd44f8726   8 hours ago      dnscache_3fa93
8fe241500093   3fa93     wikipedia_ur_all    3fa9399f8049163bd44f8726   8 hours ago      zimtask_3fa93
89d9ff455f42   778fe     wikipedia_ta_all    778fe13df44734c851ac8726   11 hours ago     zimscraper_mwoffliner_778fe
d4b68756242f   778fe     wikipedia_ta_all    778fe13df44734c851ac8726   11 hours ago     dnscache_778fe
39e2be01c467   778fe     wikipedia_ta_all    778fe13df44734c851ac8726   11 hours ago     zimtask_778fe
e2aa7f913002   3a6ca     wiktionary_en       3a6ca9582dd9520f37986726   3 days ago       zimscraper_mwoffliner_3a6ca
f72e5684f5ba   3a6ca     wiktionary_en       3a6ca9582dd9520f37986726   3 days ago       dnscache_3a6ca
7519e43bf717   3a6ca     wiktionary_en       3a6ca9582dd9520f37986726   3 days ago       zimtask_3a6ca
67a6de18e1e5                                                            3 weeks ago      zimfarm-manager
#!/bin/bash

# set to `1` if you need sudo for docker command
SUDO_DOCKER=
# zimfarm username
ZIMFARM_USERNAME="athena18"
# worker name (different than username, you choose this)
ZIMFARM_WORKER_NAME=$ZIMFARM_USERNAME
# whether to display debug-level logs
ZIMFARM_DEBUG="y"
# the max amount of RAM you want your worker to use
ZIMFARM_MAX_RAM="100GiB"
# the max amount of disk you want your worker to use (/!\ not enforced)
ZIMFARM_DISK="10TiB"
# multiply by 3 the nb of concurrent zimfarm task you'd want to be able to run
ZIMFARM_CPU="18"
# zimfarm folder. will contain your private key and a data subfolder with all data
# place your RSA private key there as 'id_rsa'
ZIMFARM_ROOT=/mnt/data/kiwix/zimfarm2
# a comma-separated list of offliners to run or "" (all).
# if you want to run youtube task, you need to be whitelisted, contact us
# ZIMFARM_OFFLINERS="mwoffliner,sotoki"
ZIMFARM_OFFLINERS=
# whether to only run task specifically assigned to this worker. set to "y"
ZIMFARM_SELFISH=
# whether to use public (cloudfare, google) DNS instead of your host's. set to "y"
USE_PUBLIC_DNS="y"
PLATFORM_wikihow_MAX_TASKS=1

athena has not configured platform limit for wikimedia so it should be subject to the default (2)