nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.71k stars 620 forks source link

moduleBinaries in toplevel module/projectDir not found, (sharp edge) #5288

Open feiloo opened 4 weeks ago

feiloo commented 4 weeks ago

Bug report

Expected behavior and actual behavior

I expected module binaries under projectDir/resources/usr/bin to be usable in projectDir/main.nf but they arent found. It is possible to use projectDir/bin instead but that creates some sharp-edged behavior. When a project is included instead of run directly, resources/usr/bin would be used instead of bin/. So the workflows behavior is dependent on it and scripts have to be copied to both directories to get the expected behavior. Correct me, but this doesnt look like an intended feature.

Steps to reproduce the problem

projectDir/main.nf:

process process1 {
        script:
        """
        hello1.sh
        """
}

workflow {
    main:
        process1()

}

projectDir/resources/usr/bin/hello1.sh:

#!/bin/bash
echo hello1

projectDir/nextflow.config:

nextflow.enable.dsl=2
nextflow.enable.moduleBinaries = true

run with:

nextflow run ../projectDir -c nextflow.config

Program output

.nextflow.log ``` Sep-05 13:53:44.677 [main] DEBUG nextflow.cli.Launcher - $> nextflow run ../projectDir/ -c nextflow.config Sep-05 13:53:44.773 [main] DEBUG nextflow.cli.CmdRun - N E X T F L O W ~ version 24.04.4 Sep-05 13:53:44.790 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/user/.nextflow/plugins; core-plugins: nf-amazon@2.5.3,nf-azure@1.6.1,nf-cloudcache@0.4.1,nf-codecommit@0.2.1,nf-console@1.1.3,nf-ga4gh@1.3.0,nf-google@1.13.2-patch1,nf-tower@1.9.1,nf-wave@1.4.2-patch1 Sep-05 13:53:44.800 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: [] Sep-05 13:53:44.801 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: [] Sep-05 13:53:44.803 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.12.0 in 'deployment' mode Sep-05 13:53:44.814 [main] INFO org.pf4j.AbstractPluginManager - No plugins Sep-05 13:53:45.200 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /data/user/miscellaneous/issues/projectDir/nextflow.config Sep-05 13:53:45.202 [main] DEBUG nextflow.config.ConfigBuilder - User config file: /data/user/miscellaneous/issues/projectDir/nextflow.config Sep-05 13:53:45.204 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /data/user/miscellaneous/issues/projectDir/nextflow.config Sep-05 13:53:45.204 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /data/user/miscellaneous/issues/projectDir/nextflow.config Sep-05 13:53:45.221 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/user/.nextflow/secrets/store.json Sep-05 13:53:45.224 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@4649d70a] - activable => nextflow.secret.LocalSecretsProvider@4649d70a Sep-05 13:53:45.252 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Sep-05 13:53:45.275 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard` Sep-05 13:53:45.299 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from config declaration Sep-05 13:53:45.311 [main] DEBUG nextflow.cli.CmdRun - Launching `../projectDir/main.nf` [cranky_rutherford] DSL2 - revision: ada3cad76b Sep-05 13:53:45.313 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[] Sep-05 13:53:45.313 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[] Sep-05 13:53:45.360 [main] DEBUG nextflow.Session - Session UUID: a1abebb7-fa07-451a-a136-9f9bf4f13393 Sep-05 13:53:45.360 [main] DEBUG nextflow.Session - Run name: cranky_rutherford Sep-05 13:53:45.361 [main] DEBUG nextflow.Session - Executor pool size: 12 Sep-05 13:53:45.368 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null Sep-05 13:53:45.373 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=36; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false Sep-05 13:53:45.389 [main] DEBUG nextflow.cli.CmdRun - Version: 24.04.4 build 5917 Created: 01-08-2024 07:05 UTC (09:05 CEST) System: Linux 5.14.21-150500.55.68-default Runtime: Groovy 4.0.21 on OpenJDK 64-Bit Server VM 17.0.12+7-suse-150400.3.45.1-x8664 Encoding: UTF-8 (UTF-8) Process: 81278@ukb2580 [10.14.25.80] CPUs: 12 - Mem: 62.7 GB (10.1 GB) - Swap: 2 GB (1.9 GB) Sep-05 13:53:45.407 [main] DEBUG nextflow.Session - Work-dir: /data/user/miscellaneous/issues/projectDir/work [xfs] Sep-05 13:53:45.408 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /data/user/miscellaneous/issues/projectDir/bin Sep-05 13:53:45.416 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[] Sep-05 13:53:45.424 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory Sep-05 13:53:45.443 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory Sep-05 13:53:45.451 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 13; maxThreads: 1000 Sep-05 13:53:45.500 [main] DEBUG nextflow.Session - Session start Sep-05 13:53:45.637 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution Sep-05 13:53:45.687 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null Sep-05 13:53:45.688 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local' Sep-05 13:53:45.693 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local Sep-05 13:53:45.698 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=12; memory=62.7 GB; capacity=12; pollInterval=100ms; dumpInterval=5m Sep-05 13:53:45.700 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local) Sep-05 13:53:45.780 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: process1 Sep-05 13:53:45.781 [main] DEBUG nextflow.Session - Igniting dataflow network (1) Sep-05 13:53:45.781 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > process1 Sep-05 13:53:45.782 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files: Script_b6df2e05938cee55: /data/user/miscellaneous/issues/projectDir/main.nf Sep-05 13:53:45.782 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination Sep-05 13:53:45.782 [main] DEBUG nextflow.Session - Session await Sep-05 13:53:45.922 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run Sep-05 13:53:45.925 [Task submitter] INFO nextflow.Session - [ba/4756a3] Submitted process > process1 Sep-05 13:53:45.957 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: process1; status: COMPLETED; exit: 127; error: -; workDir: /data/user/miscellaneous/issues/projectDir/work/ba/4756a300431b89b0a1c8b6ae7361ae] Sep-05 13:53:45.958 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'TaskFinalizer' minSize=10; maxSize=36; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false Sep-05 13:53:45.965 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for task: name=process1; work-dir=/data/user/miscellaneous/issues/projectDir/work/ba/4756a300431b89b0a1c8b6ae7361ae error [nextflow.exception.ProcessFailedException]: Process `process1` terminated with an error exit status (127) Sep-05 13:53:45.985 [TaskFinalizer-1] ERROR nextflow.processor.TaskProcessor - Error executing process > 'process1' Caused by: Process `process1` terminated with an error exit status (127) Command executed: hello1.sh Command exit status: 127 Command output: (empty) Command error: .command.sh: line 2: hello1.sh: command not found Work dir: /data/user/miscellaneous/issues/projectDir/work/ba/4756a300431b89b0a1c8b6ae7361ae Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` Sep-05 13:53:45.992 [main] DEBUG nextflow.Session - Session await > all processes finished Sep-05 13:53:45.996 [TaskFinalizer-1] DEBUG nextflow.Session - Session aborted -- Cause: Process `process1` terminated with an error exit status (127) Sep-05 13:53:46.012 [main] DEBUG nextflow.Session - Session await > all barriers passed Sep-05 13:53:46.012 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop Sep-05 13:53:46.017 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; failedDuration=21ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ] Sep-05 13:53:46.196 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Sep-05 13:53:46.211 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye ```

Environment

bentsherman commented 1 week ago

An interesting point. The module binaries feature was intended for scripts that are only ever included as modules, so if you have some script that is used both as a module and an entrypoint, it is better to move the "include-able" definitions into a separate module.

On the other hand, I was thinking it would be useful to have an entry workflow in each module that simply wraps the process or subworkflow, as a way to test as well as to show example usage. But I think that would also run into this issue.

feiloo commented 1 week ago

I sometimes run some modules directly instead of from an workflow, so I agree on its usefulness. I also try to keep down layers and indirections in the code for simplicity.

My understanding is that the main difference between module binaries and projectDir/bin is that projectDir/bin binaries are visible everywhere in the workflows, subworkflows and modules.

Since i dont see a particular downside to it, i would be in favor of having module binaries visible to the entry workflow, with an error or priority for them over projectDir/bin binaries in case the script-names collide.