Open jhaezebr opened 2 days ago
can you check mounting a volume in test.hcl
please?
Not sure (yet) how acl works but the host_volume in your example is "deny" and the nf-task requires to mount the volume
I've tested against the local cluster created in the validation
folder
( see https://github.com/nextflow-io/nf-nomad/pull/57 )
When the --secure
flag is provided the cluster is bootstraping with ACL and the NOMAD_TOKEN is required to run the pipelines
So to utilize csi volumes you at least need the plugin read permissions and csi-list-volume capability.
Other than that there is still a problem with volumes that are read-only
capability {
access_mode = "multi-node-reader-only"
attachment_mode = "file-system"
}
mount_options {
mount_flags = [ "ro" ]
}
we're mounting (all) the volumes as writable
taskDef.config.mount = [ type : "volume", target : destinationDir, source : config.jobOpts().dockerVolume, readonly : false ]
so probably we need to extend our dsl spec with more features
@jhaezebr what's the overall use-case for read-only
file systems in your setup?
@jagedn we use a read only mount for our reference store. This isn't strictly needed, but we want this mount to be read-only so a rogue process can't go about deleting or changing any of the references.
I've made a seperate issue for the read-only use-case: https://github.com/nextflow-io/nf-nomad/issues/60 I'll focus on the ACL part here :)
For the moment this ACL seems to work for nextflow:
namespace "nextflow" {
policy = "write"
capabilities = [
"csi-write-volume",
"csi-read-volume",
"csi-list-volume",
"csi-mount-volume"
]
}
agent {
policy = "deny"
}
operator {
policy = "deny"
}
quota {
policy = "deny"
}
node {
policy = "deny"
}
host_volume "*" {
policy = "deny"
}
plugin {
policy = "read"
}
Gotcha - thanks @jhaezebr !
Quick question, did you test with fusionfs
setup or just CSI?
Judging from the following, I think as fusionfs
requires the use of tmp
, this could be a blocker.
host_volume "*" {
policy = "deny"
}
Ideally, we want to keep feature parity with both 🤝
No, I didn't test fusionfs, just csi. We don't use fusionfs in our cluster and I'm not familiar with it.
Nextflow seems to be unable to submit jobs when ACL is enabled, but using the same token I can submit a job using the nomad CLI.
Nextflow log
``` Jul-03 12:13:27.492 [main] DEBUG nextflow.cli.Launcher - $> nextflow run hello -c nomad.config -w ./work Jul-03 12:13:27.870 [main] DEBUG nextflow.cli.CmdRun - N E X T F L O W ~ version 24.04.2 Jul-03 12:13:27.930 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/research/.nextflow/plugins; core-plugins: nf-amazon@2.5.2,nf-azure@1.6.0,nf-cloudcache@0.4.1,nf-codecommit@0.2.0,nf-console@1.1.3,nf-ga4gh@1.3.0,nf-google@1.13.2,nf-tower@1.9.1,nf-wave@1.4.2 Jul-03 12:13:28.014 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: [] Jul-03 12:13:28.016 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: [] Jul-03 12:13:28.025 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.10.0 in 'deployment' mode Jul-03 12:13:28.189 [main] INFO org.pf4j.AbstractPluginManager - No plugins Jul-03 12:13:28.232 [main] DEBUG nextflow.scm.ProviderConfig - Using SCM config path: /home/research/.nextflow/scm Jul-03 12:13:28.253 [main] DEBUG nextflow.scm.AssetManager - Listing projects in folder: /home/research/.nextflow/assets Jul-03 12:13:30.130 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/research/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git Jul-03 12:13:30.344 [main] DEBUG nextflow.scm.RepositoryFactory - Found Git repository result: [RepositoryFactory] Jul-03 12:13:30.389 [main] DEBUG nextflow.scm.AssetManager - Git config: /home/research/.nextflow/assets/nextflow-io/hello/.git/config; branch: master; remote: origin; url: https://github.com/nextflow-io/hello.git Jul-03 12:13:32.835 [main] DEBUG nextflow.config.ConfigBuilder - Found config home: /home/research/.nextflow/config Jul-03 12:13:32.837 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /home/research/.nextflow/assets/nextflow-io/hello/nextflow.config Jul-03 12:13:32.849 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /scratch/nf-nomad/nomad.config Jul-03 12:13:32.852 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/research/.nextflow/config Jul-03 12:13:32.853 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/research/.nextflow/assets/nextflow-io/hello/nextflow.config Jul-03 12:13:32.854 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /scratch/nf-nomad/nomad.config Jul-03 12:13:32.892 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/research/.nextflow/secrets/store.json Jul-03 12:13:32.900 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@2b736fee] - activable => nextflow.secret.LocalSecretsProvider@2b736fee Jul-03 12:13:32.912 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Jul-03 12:13:33.202 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Jul-03 12:13:33.274 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Jul-03 12:13:33.744 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default Jul-03 12:13:33.751 [main] DEBUG nextflow.cli.CmdRun - Launching `https://github.com/nextflow-io/hello` [disturbed_shannon] DSL2 - revision: 7588c46ffe [master] Jul-03 12:13:33.756 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins declared=[nf-nomad@0.1.1] Jul-03 12:13:33.758 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[] Jul-03 12:13:33.760 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[nf-nomad@0.1.1] Jul-03 12:13:33.761 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-nomad version: 0.1.1 Jul-03 12:13:33.798 [main] INFO org.pf4j.AbstractPluginManager - Plugin 'nf-nomad@0.1.1' resolved Jul-03 12:13:33.798 [main] INFO org.pf4j.AbstractPluginManager - Start plugin 'nf-nomad@0.1.1' Jul-03 12:13:33.862 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-nomad@0.1.1 Jul-03 12:13:34.025 [main] DEBUG nextflow.Session - Session UUID: 52aae5fc-1036-4f86-af10-e5633ac019f5 Jul-03 12:13:34.026 [main] DEBUG nextflow.Session - Run name: disturbed_shannon Jul-03 12:13:34.026 [main] DEBUG nextflow.Session - Executor pool size: 80 Jul-03 12:13:34.047 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null Jul-03 12:13:34.063 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=240; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false Jul-03 12:13:34.134 [main] DEBUG nextflow.cli.CmdRun - Version: 24.04.2 build 5914 Created: 29-05-2024 06:19 UTC System: Linux 5.4.0-150-generic Runtime: Groovy 4.0.21 on OpenJDK 64-Bit Server VM 11.0.23-internal+0-adhoc..src Encoding: UTF-8 (UTF-8) Process: 59747@compute-87hs7j2 [127.0.1.1] CPUs: 80 - Mem: 629.8 GB (13.6 GB) - Swap: 4 GB (3.6 GB) Jul-03 12:13:34.273 [main] DEBUG nextflow.Session - Work-dir: /scratch/nf-nomad/work [ceph] Jul-03 12:13:34.274 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/research/.nextflow/assets/nextflow-io/hello/bin Jul-03 12:13:34.331 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[NomadExecutor] Jul-03 12:13:34.369 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory Jul-03 12:13:34.506 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory Jul-03 12:13:34.545 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 81; maxThreads: 1000 Jul-03 12:13:34.749 [main] DEBUG nextflow.Session - Session start Jul-03 12:13:35.455 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution Jul-03 12:13:35.736 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: nomad Jul-03 12:13:35.736 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'nomad' Jul-03 12:13:35.744 [main] DEBUG nextflow.executor.Executor - [warm up] executor > nomad Jul-03 12:13:35.765 [main] DEBUG n.processor.TaskPollingMonitor - Creating task monitor for executor 'nomad' > capacity: 100; pollInterval: 5s; dumpInterval: 5m Jul-03 12:13:35.771 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: nomad) Jul-03 12:13:36.185 [main] DEBUG n.nomad.executor.NomadService - [NOMAD] Client Address: http://nomad.ops.cmgg.be/v1 Jul-03 12:13:36.186 [main] DEBUG n.nomad.executor.NomadService - [NOMAD] Client Token: 4465a.. Jul-03 12:13:36.549 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: sayHello Jul-03 12:13:36.550 [main] DEBUG nextflow.Session - Igniting dataflow network (2) Jul-03 12:13:36.552 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > sayHello Jul-03 12:13:36.564 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files: Script_45e06ae60646ee81: /home/research/.nextflow/assets/nextflow-io/hello/main.nf Jul-03 12:13:36.565 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination Jul-03 12:13:36.565 [main] DEBUG nextflow.Session - Session await Jul-03 12:13:38.298 [Actor Thread 8] INFO nextflow.processor.TaskProcessor - [sayHello (4)] cache hash: 233d257343efe6e16bd7c6104c229955; mode: STANDARD; entries: 264bf2d524d18f4ce02bfcc59170f616 [java.util.UUID] 52aae5fc-1036-4f86-af10-e5633ac019f5 3a5266cb2487ca6ddc8c22a42478f272 [java.lang.String] sayHello ee0a1d23a8c26fdf4d1575310833774f [java.lang.String] """ echo '$x world!' """ 20edf49cb4b22a20a5e05a9d1144bf0f [java.lang.String] quay.io/nextflow/bash 769f897d21d56476ad01edc930becff0 [java.lang.String] x f5e76d4e64af0c5d859ff08ab3b720b7 [java.lang.String] Hola 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $ 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true Jul-03 12:13:38.275 [Actor Thread 7] INFO nextflow.processor.TaskProcessor - [sayHello (3)] cache hash: 7121055b03c0817999f33638f4237c5d; mode: STANDARD; entries: 264bf2d524d18f4ce02bfcc59170f616 [java.util.UUID] 52aae5fc-1036-4f86-af10-e5633ac019f5 3a5266cb2487ca6ddc8c22a42478f272 [java.lang.String] sayHello ee0a1d23a8c26fdf4d1575310833774f [java.lang.String] """ echo '$x world!' """ 20edf49cb4b22a20a5e05a9d1144bf0f [java.lang.String] quay.io/nextflow/bash 769f897d21d56476ad01edc930becff0 [java.lang.String] x 0ab6632d52e811e9ef7c044666ac496a [java.lang.String] Hello 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $ 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true Jul-03 12:13:38.357 [Actor Thread 4] INFO nextflow.processor.TaskProcessor - [sayHello (1)] cache hash: 5c5ceeed61a78867efbf73384c00380e; mode: STANDARD; entries: 264bf2d524d18f4ce02bfcc59170f616 [java.util.UUID] 52aae5fc-1036-4f86-af10-e5633ac019f5 3a5266cb2487ca6ddc8c22a42478f272 [java.lang.String] sayHello ee0a1d23a8c26fdf4d1575310833774f [java.lang.String] """ echo '$x world!' """ 769f897d21d56476ad01edc930becff0 [java.lang.String] x c9273e5a7ac3508ef910437c4bb35a90 [java.lang.String] Bonjour 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $ 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true Jul-03 12:13:38.298 [Actor Thread 6] INFO nextflow.processor.TaskProcessor - [sayHello (2)] cache hash: c607458338b72c0746d6fcac6772aa62; mode: STANDARD; entries: 264bf2d524d18f4ce02bfcc59170f616 [java.util.UUID] 52aae5fc-1036-4f86-af10-e5633ac019f5 3a5266cb2487ca6ddc8c22a42478f272 [java.lang.String] sayHello ee0a1d23a8c26fdf4d1575310833774f [java.lang.String] """ echo '$x world!' """ 20edf49cb4b22a20a5e05a9d1144bf0f [java.lang.String] quay.io/nextflow/bash 769f897d21d56476ad01edc930becff0 [java.lang.String] x 442e002ddd8b0a2b10ed51352f8c0488 [java.lang.String] Ciao 4f9d4b0d22865056c37fb6d9c2a04a67 [java.lang.String] $ 16fe7483905cce7a85670e43e4678877 [java.lang.Boolean] true Jul-03 12:13:38.649 [Task submitter] DEBUG n.nomad.executor.NomadTaskHandler - [NOMAD] Submitting task sayHello (2) - work-dir=/scratch/nf-nomad/work/70/ecf3dfb7e0c167b38d4183e81c87fa Jul-03 12:13:39.197 [Task submitter] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for task: name=sayHello (2); work-dir=/scratch/nf-nomad/work/70/ecf3dfb7e0c167b38d4183e81c87fa error [nextflow.exception.ProcessSubmitException]: [NOMAD] Failed to submit sayHello (2) -- Cause: Forbidden Jul-03 12:13:39.256 [Task submitter] DEBUG nextflow.processor.TaskRun - Unable to dump error of process 'null' -- Cause: java.nio.file.NoSuchFileException: /scratch/nf-nomad/work/70/ecf3dfb7e0c167b38d4183e81c87fa/.command.log Jul-03 12:13:39.269 [Task submitter] ERROR nextflow.processor.TaskProcessor - Error executing process > 'sayHello (2)' Caused by: Forbidden Command executed: echo 'Ciao world!' Command exit status: - Command output: (empty) Work dir: /scratch/nf-nomad/work/70/ecf3dfb7e0c167b38d4183e81c87fa Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out` Jul-03 12:13:39.274 [Task submitter] DEBUG nextflow.Session - Session aborted -- Cause: [NOMAD] Failed to submit sayHello (2) -- Cause: Forbidden Jul-03 12:13:39.360 [Task submitter] DEBUG nextflow.Session - The following nodes are still active: [operator] view Jul-03 12:13:39.409 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: nomad) - terminating tasks monitor poll loop Jul-03 12:13:39.428 [main] DEBUG nextflow.Session - Session await > all processes finished Jul-03 12:13:39.428 [main] DEBUG nextflow.Session - Session await > all barriers passed Jul-03 12:13:39.446 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=4; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ] Jul-03 12:13:39.697 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Jul-03 12:13:39.745 [main] INFO org.pf4j.AbstractPluginManager - Stop plugin 'nf-nomad@0.1.1' Jul-03 12:13:39.745 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-nomad Jul-03 12:13:39.753 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye ```Nextflow config
``` dumpHashes = true plugins { id 'nf-nomad@0.1.1' } process { executor = "nomad" docker.enabled = true } nomad { client { address = "http://nomad.example.com" token = "XXXXXXXXXXXXXXXXXXX" } jobs { deleteOnCompletion = false namespace = "nextflow" datacenters = ['dc'] volumes = [ { type "csi" name "nf_scratch_volume" path "/scratch" }, { type "csi" name "nf_reference_volume" path "/references" } ] } } ```Nomad log
``` 2024-07-03T12:13:39.167Z [TRACE] nomad.job: job mutate results: mutator=canonicalize warnings=[] error=Manual run
``` $ export NOMAD_TOKEN='XXXXXXXXXXXX' $ export NOMAD_ADDR="http://nomad.example.com" $ export NOMAD_NAMESPACE=nextflow $ export NOMAD_DC=s10 $ nomad job run test.hcl ==> Monitoring evaluation "02b6eef0" Evaluation triggered by job "example" Evaluation within deployment: "4d4d3f64" Allocation "984a1dcb" created: node "57dfcfcd", group "example" Evaluation status changed: "pending" -> "complete" ==> Evaluation "02b6eef0" finished with status "complete" $ nomad job status ID Type Priority Status Submit Date example service 50 running 2024-07-03T12:12:42Z $ cat test.hcl job "example" { group "example" { task "sleep" { driver = "docker" config { image = "busybox:latest" entrypoint = ["/bin/sleep", "300"] } resources { cpu = 500 memory = 256 } } } } ```Nomad nextflow ACL
``` namespace "nextflow" { policy = "write" } agent { policy = "deny" } operator { policy = "deny" } quota { policy = "deny" } node { policy = "deny" } host_volume "*" { policy = "deny" } plugin { policy = "deny" } ```