seqeralabs / nf-tower

Nextflow Tower system
https://tower.nf
Mozilla Public License 2.0
143 stars 51 forks source link

Unable to access config file -- Unknown host: java.net.UnknownHostException: api.tower.nf #393

Closed NickSwainston closed 11 months ago

NickSwainston commented 11 months ago

I'm trying to run nf-tower using a SLURM platform. It fails as it can not download an SCM config file even though I haven't set any secrets. I have checked that I can access api.tower.nf with the curl command and I can run nextflow -with-tower commands on the nodes I am trying to use as the head node. Here is an example of the launch.sh:

#!/bin/bash
#SBATCH -D /fred/oz005/users/nswainst/tower
#SBATCH -J nf-workflow-1k0K4WlGep8ZTM
#SBATCH -o /fred/oz005/users/nswainst/tower/nf-1k0K4WlGep8ZTM.log
#SBATCH --no-requeue
#SBATCH -p trevor
#SBATCH --mem=8GB
set -e
set -o pipefail

# Input variables:
#
# - NXF_UUID: nextflow session id generated by tower
# - NXF_LOG_FILE: nextflow log file name
# - NXF_OUT_FILE: nextflow output file name
# - NXF_IGNORE_RESUME_HISTORY: do not stop for missing nextflow history file
# - NXF_CONFIG_BASE64: nextflow config file encoded as base64 string
# - NXF_SCM_BASE64: nextflow scm file encoded as base64 string
# - NXF_DEBUG: enable debugging mode
# - TOWER_ACCESS_TOKEN: Tower access token
# - TOWER_WORKFLOW_ID: Workflow ID generated by Tower
# - TOWER_CONFIG_BASE64: tower config file encoded as base64 string
# - TOWER_CONFIG_FILE: tower config file name

export NXF_IGNORE_RESUME_HISTORY=true
export NXF_WORK=/fred/oz005/users/nswainst/work
export TOWER_REPORTS_FILE=nf-1k0K4WlGep8ZTM-reports.tsv
export NXF_EXIT_FILE=nf-1k0K4WlGep8ZTM.exit
export NXF_CONFIG_FILE=nf-1k0K4WlGep8ZTM.config
export NXF_OUT_FILE=nf-1k0K4WlGep8ZTM.txt
export NXF_ASSETS=/fred/oz005/users/nswainst/work/.nextflow/pipelines/56a2786f
export NXF_UUID=40f3f3fb-db1e-4736-a606-8beb680628b1
export NXF_TML_FILE=timeline-1k0K4WlGep8ZTM.html
export TOWER_WORKFLOW_ID=1k0K4WlGep8ZTM
export NXF_ANSI_LOG=false
export NXF_PLUGINS_DEFAULT=nf-tower
export NXF_PRERUN_BASE64=bW9kdWxlIHVzZSAvYXBwcy91c2Vycy9wdWxzYXIvb3BlbnN0YWNrL2djYy0xMS4zLjAvbW9kdWxlZmlsZXMKbW9kdWxlIGxvYWQgcHNyaG9tZS9sYXRlc3QKZXhwb3J0IFRPV0VSX0FDQ0VTU19UT0tFTj1leUpoYkdjaU9pSklVekkxTmlKOS5leUp6ZFdJaU9pSTRNak0wSWl3aWJtSm1Jam94Tmprd09UWTFOelEyTENKeWIyeGxjeUk2V3lKMWMyVnlJbDBzSW1semN5STZJblJ2ZDJWeUxXRndjQ0lzSW1WNGNDSTZNVFk1TURrMk9UTTBOaXdpYVdGMElqb3hOamt3T1RZMU56UTJmUS4xdnBBaDdpUU1rT3pjU1p1ZnF1RmphWWlIbDVNNnhRNE1Ba2RTeVhMMkNzCmV4cG9ydCBUT1dFUl9SRUZSRVNIX1RPS0VOPWV5SmhiR2NpT2lKSVV6STFOaUo5Lk56RmtZVFU0TXpRdE5XRTJNQzAwT0dRNExUZ3daR0l0WXpka1pqVXlZakF3WXpGaS40STl1a3FfWWU5enNPZWoxTFczVEpNb0Q4VTZWaGZheGR2UUdpZWVOcjRNCmV4cG9ydCBOWEZfU0NNX0ZJTEU9aHR0cHM6Ly9hcGkudG93ZXIubmYvZXBoZW1lcmFsL3hERndOSkpDWWhoNUpmSERMakp3VGcK
export NXF_LOG_FILE=nf-1k0K4WlGep8ZTM.log
export NXF_CONFIG_BASE64=dGltZWxpbmUuZW5hYmxlZCA9IHRydWUKdGltZWxpbmUuZmlsZSA9ICIkTlhGX1RNTF9GSUxFIgpwcm9jZXNzLmV4ZWN1dG9yID0gJ3NsdXJtJwpwcm9jZXNzLnF1ZXVlID0gJ21pbGFuJwp3b3JrRGlyID0gJy9mcmVkL296MDA1L3VzZXJzL25zd2FpbnN0L3dvcmsnCi8vLS0tIHVzZXIgY3VzdG9tIGNvbmZpZwppbmNsdWRlQ29uZmlnICdodHRwczovL2FwaS50b3dlci5uZi9lcGhlbWVyYWwvc0h5MV9LY3pDaGxMLThNVmVWMkpOQSc=

[[ $NXF_DEBUG ]] && (env | sort) && set -x
cache_path=".nextflow/cache/$NXF_UUID"

function save_exit() {
    # Save exit code to file: note NXF_EXIT_FILE is expected to always be set; otherwise, the script will fail (return a non-zero exit code) at this point.
    [[ $NXF_EXIT_FILE ]] && printf $1 > $NXF_EXIT_FILE
}

function pre_run() {
    if [[ $NXF_PRERUN_BASE64 ]]; then
      source /dev/stdin <<<"$(cat <(echo $NXF_PRERUN_BASE64 | base64 -d))" > >(tee -a $NXF_OUT_FILE) 2>&1
    fi
}

function post_run() {
    if [[ $NXF_POSTRUN_BASE64 ]]; then
      bash <(echo $NXF_POSTRUN_BASE64 | base64 -d) > >(tee -a $NXF_OUT_FILE) 2>&1 || true
    fi
}

function on_exit() {
    NXF_EXIT_STATUS=$?
    save_exit $NXF_EXIT_STATUS
    rm -rf $NXF_SECRETS_FILE
    export NXF_EXIT_STATUS
    post_run
    exit $NXF_EXIT_STATUS
}

function load_cache() {
    if [[ $TOWER_RESUME_DIR ]]; then
      mkdir -p "$cache_path"
      [ -e "$TOWER_RESUME_DIR/$cache_path" ] && rsync -r "$TOWER_RESUME_DIR/$cache_path"/ "$cache_path" || true
    fi
}

function term_run() {
  kill -TERM $nf_pid
  wait $nf_pid
}

trap 'save_exit $?' EXIT

pre_run
load_cache

if [[ $NXF_CONFIG_BASE64 ]]; then
  echo $NXF_CONFIG_BASE64 | base64 -d > ${NXF_CONFIG_FILE:-nextflow.config}
  unset NXF_CONFIG_BASE64
fi

# save tower config file
if [[ $TOWER_CONFIG_BASE64 ]]; then
  echo $TOWER_CONFIG_BASE64 | base64 -d > $TOWER_CONFIG_FILE
fi

# save secrets
if [[ $NXF_SECRETS_BASE64 ]]; then
  export NXF_SECRETS_FILE=$PWD/nf-${TOWER_WORKFLOW_ID}.secrets.json
  echo $NXF_SECRETS_BASE64 | base64 -d > $NXF_SECRETS_FILE
  chmod 600 $NXF_SECRETS_FILE
fi

[[ $NXF_DEBUG ]] && nextflow -Dcapsule.log=verbose info -dd

trap term_run TERM INT USR2
trap on_exit EXIT
trap '' USR1

nextflow run https\://github.com/OZGrav/nf-core-meerpipe -name cranky_woese -params-file https\://api.tower.nf/ephemeral/H9W5NkAJHCIi-VgCVOO7LA.json -with-tower -r dev -latest > >(tee -a $NXF_OUT_FILE) 2>&1 &
nf_pid=$!
wait $nf_pid

and here is the log:

Aug-02 18:42:46.225 [main] DEBUG nextflow.cli.Launcher - $> nextflow run 'https://github.com/OZGrav/nf-core-meerpipe' -name cranky_woese -params-file 'https://api.tower.nf/ephemeral/H9W5NkAJHCIi-VgCVOO7LA.json' -with-tower -r dev -latest
Aug-02 18:42:46.261 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 22.10.6
N E X T F L O W  ~  version 22.10.6
Aug-02 18:42:46.309 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/nswainst/.nextflow/plugins; core-plugins: nf-amazon@1.11.3,nf-azure@0.14.2,nf-codecommit@0.1.2,nf-console@1.0.4,nf-ga4gh@1.0.4,nf-google@1.4.5,nf-tower@1.5.6,nf-wave@0.5.3
Aug-02 18:42:46.337 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Aug-02 18:42:46.338 [main] INFO  org.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Aug-02 18:42:46.341 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Aug-02 18:42:46.367 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Aug-02 18:42:46.380 [main] DEBUG nextflow.scm.ProviderConfig - Detected SCM custom path: https://api.tower.nf/ephemeral/xDFwNJJCYhh5JfHDLjJwTg
Aug-02 18:42:46.641 [main] ERROR nextflow.cli.Launcher - Unable to access config file 'https://api.tower.nf/ephemeral/xDFwNJJCYhh5JfHDLjJwTg' -- Unknown host: java.net.UnknownHostException: api.tower.nf

  api.tower.nf

java.net.UnknownHostException: api.tower.nf
    at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229)
    at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.base/java.net.Socket.connect(Socket.java:609)
    at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:305)
    at java.base/sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
    at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:182)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:508)
    at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:603)
    at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266)
    at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:373)
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:207)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1187)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1081)
    at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:193)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1592)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1520)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:334)
    at nextflow.file.http.XFileSystemProvider.toConnection0(XFileSystemProvider.groovy:222)
    at nextflow.file.http.XFileSystemProvider.toConnection(XFileSystemProvider.groovy:210)
    at nextflow.file.http.XFileSystemProvider.newInputStream(XFileSystemProvider.groovy:359)
    at java.base/java.nio.file.Files.newInputStream(Files.java:156)
    at java.base/java.nio.file.Files.newBufferedReader(Files.java:2839)
    at org.apache.groovy.nio.extensions.NioExtensions.newReader(NioExtensions.java:1404)
    at org.apache.groovy.nio.extensions.NioExtensions.getText(NioExtensions.java:397)
    at nextflow.scm.ProviderConfig.getFromFile(ProviderConfig.groovy:286)
    at nextflow.scm.ProviderConfig.getDefault(ProviderConfig.groovy:332)
    at nextflow.scm.AssetManager.<init>(AssetManager.groovy:103)
    at nextflow.cli.CmdRun.getScriptFile0(CmdRun.groovy:503)
    at nextflow.cli.CmdRun.getScriptFile(CmdRun.groovy:444)
    at nextflow.cli.CmdRun.run(CmdRun.groovy:300)
    at nextflow.cli.Launcher.run(Launcher.groovy:487)
    at nextflow.cli.Launcher.main(Launcher.groovy:646)
Unknown network host: api.tower.nf

Any help you can provide helping me fix this bug would be greatly appreciated.

pditommaso commented 11 months ago

It looks like an network problem on your side, that's a valid host name

curl https://api.tower.nf/service-info
NickSwainston commented 11 months ago

Turns out the head Nextflow command job was being sent to the wrong partition (which doesn't have internet access), which is a separate issue.