nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.69k stars 621 forks source link

collectFile with sort: false fails when storeDir points to GCS #3993

Closed Puumanamana closed 4 months ago

Puumanamana commented 1 year ago

Bug report

Hello,

I've experienced an issue when calling collectFile() with sort: false and storeDir pointing to GCS location. This works normally when writing locally but I get Cloud Storage objects are immutable. error when I write to GCS. Is collectFile() supposed to work with a remote GCS path in storeDir?

Thanks!

Steps to reproduce the problem

main.nf

workflow {
    values = Channel.of('a', 'b')
    values.collectFile(name: "file.txt", storeDir: "gs://nxf-work/trash", sort: false, newLine: true)
}
google {
    project = "rome-pipeline-engine"
    region = "us-central1"
}

Program output

stdout

$ nextflow main.nf -w gs://nf-tower-public/scratch                                                                                                                                                                                                                      
N E X T F L O W  ~  version 23.05.0-edge
Launching `main.nf` [disturbed_bardeen] DSL2 - revision: 2468ee65c9
Jun 01, 2023 8:59:26 PM com.google.auth.oauth2.DefaultCredentialsProvider warnAboutProblematicCredentials
WARNING: Your application has authenticated using end user credentials from Google Cloud SDK. We recommend that most server applications use service accounts instead. If your application continues to use end user credentials from Cloud SDK, you might receive a "quota exceeded" or "API not enabled" error. For more information about service accounts, see https://cloud.google.com/docs/authentication/.
ERROR ~ Cloud Storage objects are immutable.

 -- Check '.nextflow.log' file for details

.nextflow.log

$ cat .nextflow.log                                                                                                                                                                                                                                                     1 ↵
Jun-01 20:59:24.782 [main] DEBUG nextflow.cli.Launcher - $> nextflow main.nf -w 'gs://nf-tower-public/scratch'
Jun-01 20:59:24.898 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.05.0-edge
Jun-01 20:59:24.920 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/cedric/.nextflow/plugins; core-plugins: nf-amazon@2.0.0,nf-azure@1.1.0,nf-codecommit@0.1.5,nf-console@1.0.6,nf-ga4gh@1.0.6,nf-google
@1.7.4,nf-tower@1.5.13,nf-wave@0.9.0
Jun-01 20:59:24.930 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
Jun-01 20:59:24.931 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
Jun-01 20:59:24.935 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
Jun-01 20:59:24.945 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
Jun-01 20:59:24.967 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /home/cedric/sandbox/nxf-immutable-bug/nextflow.config
Jun-01 20:59:24.968 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /home/cedric/sandbox/nxf-immutable-bug/nextflow.config
Jun-01 20:59:24.989 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Jun-01 20:59:25.640 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 by global default
Jun-01 20:59:25.659 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [disturbed_bardeen] DSL2 - revision: 2468ee65c9
Jun-01 20:59:25.660 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[nf-google@1.7.4]
Jun-01 20:59:25.661 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[nf-google@1.7.4, nf-tower@1.5.13]
Jun-01 20:59:25.661 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-google version: 1.7.4
Jun-01 20:59:25.673 [main] INFO  org.pf4j.AbstractPluginManager - Plugin 'nf-google@1.7.4' resolved
Jun-01 20:59:25.673 [main] INFO  org.pf4j.AbstractPluginManager - Start plugin 'nf-google@1.7.4'
Jun-01 20:59:25.703 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-google@1.7.4
Jun-01 20:59:25.704 [main] DEBUG nextflow.plugin.PluginUpdater - Installing plugin nf-tower version: 1.5.13
Jun-01 20:59:25.705 [main] INFO  org.pf4j.AbstractPluginManager - Plugin 'nf-tower@1.5.13' resolved
Jun-01 20:59:25.705 [main] INFO  org.pf4j.AbstractPluginManager - Start plugin 'nf-tower@1.5.13'
Jun-01 20:59:25.713 [main] DEBUG nextflow.plugin.BasePlugin - Plugin started nf-tower@1.5.13
Jun-01 20:59:25.726 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/cedric/.nextflow/secrets/store.json
Jun-01 20:59:25.731 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@a2341c6] - activable => nextflow.secret.LocalSecretsProvider@a2341c6
Jun-01 20:59:25.804 [main] DEBUG nextflow.Session - Session UUID: 52aec8f2-2976-44da-a753-90dd69f92463
Jun-01 20:59:25.805 [main] DEBUG nextflow.Session - Run name: disturbed_bardeen
Jun-01 20:59:25.805 [main] DEBUG nextflow.Session - Executor pool size: 8
Jun-01 20:59:26.045 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
Jun-01 20:59:26.051 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=24; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false                                                                             Jun-01 20:59:26.083 [main] DEBUG nextflow.cli.CmdRun -
  Version: 23.05.0-edge build 5861
  Created: 15-05-2023 04:13 UTC
  System: Linux 5.15.0-1022-gcp
  Runtime: Groovy 3.0.17 on OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src
  Encoding: UTF-8 (UTF-8)
  Process: 1539813@nf-tower-main [10.128.0.2]
  CPUs: 8 - Mem: 62.8 GB (28.3 GB) - Swap: 0 (0)
Jun-01 20:59:26.112 [main] DEBUG nextflow.file.FileHelper - Can't check if specified path is NFS (1): gs://nf-tower-public/scratch

Jun-01 20:59:26.112 [main] DEBUG nextflow.Session - Work-dir: gs://nf-tower-public/scratch [null]                                                                                                                                                                     [27/441]
Jun-01 20:59:26.112 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/cedric/sandbox/nxf-immutable-bug/bin
Jun-01 20:59:26.140 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[GoogleLifeSciencesExecutor, GoogleBatchExecutor]
Jun-01 20:59:26.159 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Jun-01 20:59:26.162 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Jun-01 20:59:26.184 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
Jun-01 20:59:26.196 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 9; maxThreads: 1000
Jun-01 20:59:26.278 [main] DEBUG nextflow.Session - Session start
Jun-01 20:59:26.468 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Jun-01 20:59:27.085 [main] DEBUG nextflow.Session - Workflow process names [dsl2]:
Jun-01 20:59:27.086 [main] DEBUG nextflow.Session - Igniting dataflow network (1)
Jun-01 20:59:27.087 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
Jun-01 20:59:27.109 [main] DEBUG nextflow.Session - Session await
Jun-01 20:59:27.109 [main] DEBUG nextflow.Session - Session await > all processes finished
Jun-01 20:59:27.109 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jun-01 20:59:27.457 [Actor Thread 2] ERROR nextflow.extension.OperatorImpl - @unknown
com.google.cloud.storage.contrib.nio.CloudStorageObjectImmutableException: Cloud Storage objects are immutable.
        at com.google.cloud.storage.contrib.nio.CloudStorageFileAttributeView.setTimes(CloudStorageFileAttributeView.java:65)                                                                                                                                         [10/441]
        at java.base/java.nio.file.CopyMoveHelper.copyToForeignTarget(CopyMoveHelper.java:135)
        at java.base/java.nio.file.CopyMoveHelper.moveToForeignTarget(CopyMoveHelper.java:157)
        at java.base/java.nio.file.Files.move(Files.java:1435)
        at nextflow.file.SimpleFileCollector.saveFile(SimpleFileCollector.groovy:102)
        at nextflow.file.FileCollector.saveTo0(FileCollector.groovy:228)
        at nextflow.file.FileCollector.saveTo(FileCollector.groovy:251)
        at nextflow.file.FileCollector$saveTo.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at nextflow.extension.CollectFileOp.emitItems(CollectFileOp.groovy:187)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
        at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1047)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1132)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at groovy.lang.Closure.call(Closure.java:412)
        at groovy.lang.Closure.call(Closure.java:428)
        at groovy.lang.Closure$call.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at nextflow.extension.DataflowHelper$3.afterStop(DataflowHelper.groovy:254)
        at groovyx.gpars.dataflow.operator.DataflowProcessor.fireAfterStop(DataflowProcessor.java:324)
        at groovyx.gpars.dataflow.operator.DataflowProcessorActor.afterStop(DataflowProcessorActor.java:59)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1258)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
        at org.codehaus.groovy.runtime.InvokerHelper.invokePojoMethod(InvokerHelper.java:1024)
        at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:1015)
        at groovyx.gpars.actor.Actor.callDynamic(Actor.java:369)
        at groovyx.gpars.actor.Actor.handleTermination(Actor.java:320)
        at groovyx.gpars.actor.AbstractLoopingActor.terminate(AbstractLoopingActor.java:204)
        at groovyx.gpars.dataflow.operator.DataflowProcessor.terminate(DataflowProcessor.java:147)
        at groovyx.gpars.dataflow.operator.DataflowProcessorActor.checkPoison(DataflowProcessorActor.java:115)
        at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:83)
        at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
        at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
        at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
        at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
        at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Jun-01 20:59:27.468 [Actor Thread 2] DEBUG nextflow.Session - Session aborted -- Cause: Cloud Storage objects are immutable.
Jun-01 20:59:27.492 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=0; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=0ms; fai
ledDuration=0ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=0; peakCpus=0; peakMemory=0; ]
Jun-01 20:59:27.518 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
Jun-01 20:59:27.519 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-tower@1.5.13'
Jun-01 20:59:27.519 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-tower
Jun-01 20:59:27.519 [main] INFO  org.pf4j.AbstractPluginManager - Stop plugin 'nf-google@1.7.4'
Jun-01 20:59:27.519 [main] DEBUG nextflow.plugin.BasePlugin - Plugin stopped nf-google
Jun-01 20:59:27.564 [main] DEBUG nextflow.file.FileCollector - Deleting file collector temp dir: /tmp/nxf-7720900685142813873
Jun-01 20:59:27.569 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

Environment

ejseqera commented 4 months ago

I'm encountering the same issue in Nextflow version 23.10.1 when using the collectFile operator with storeDir pointing to a GCS path, only when using the sort: false option.

With the following main.nf:

workflow {
    values = Channel.of('alpha', 'beta', 'gamma')
    values.collectFile(name: 'combined.txt', storeDir: "${params.outdir}/combined", sort: false)
}

Program output

Results in the same error as above:

Apr-30 02:51:18.794 [Actor Thread 2] ERROR nextflow.extension.OperatorImpl - @unknown
com.google.cloud.storage.contrib.nio.CloudStorageObjectImmutableException: Cloud Storage objects are immutable.
    at com.google.cloud.storage.contrib.nio.CloudStorageFileAttributeView.setTimes(CloudStorageFileAttributeView.java:65)
    at java.base/java.nio.file.CopyMoveHelper.copyToForeignTarget(CopyMoveHelper.java:135)
    at java.base/java.nio.file.CopyMoveHelper.moveToForeignTarget(CopyMoveHelper.java:157)
    at java.base/java.nio.file.Files.move(Files.java:1435)
    at nextflow.file.SimpleFileCollector.saveFile(SimpleFileCollector.groovy:102)
    at nextflow.file.FileCollector.saveTo0(FileCollector.groovy:228)
    at nextflow.file.FileCollector.saveTo(FileCollector.groovy:251)
    at nextflow.file.FileCollector$saveTo.call(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
    at nextflow.extension.CollectFileOp.emitItems(CollectFileOp.groovy:187) 
...

Removal of sort: false option to collectFIle does not return the same error, and results in a successful run. This is currently breaking nf-core/sarek v3.4.1 on Google Batch.

Environment

  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC 
  System: Linux 6.1.75+
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.10+7-LTS
  Encoding: UTF-8 (UTF-8)
  Process: 110@304b07c45cca [172.17.0.2]
  CPUs: 2 - Mem: 1.9 GB (67.2 MB) - Swap: 0 (0)