Open jfy133 opened 4 days ago
The same issue occurred a while ago in the charliecloud driver, and I am pretty sure that it is caused by nextflow's FileMutex()
, but I never figured out how / why.
Please include the full log file
Please include the full log file
Sorry I couldn't include that at the time, was having internet issues so the only thing I could get was the console.
I'm away from a computer until Monday, @nschan if you can share a log from your side that'll would be appreciated.
I'll also ask on nf-core if anyone else gets the same thing with other pipelines
I do not have logs for this anymore, the problem in charliecloud was fixed a while ago.
I was able to to distract my son for 5 minutes to try again, and I got the same error again (although interestingly @mahesh-panchal was trying independently also on gitpod and did not get it)
Log file here:
Thanks, the relevant information is this
java.lang.IllegalStateException: java.nio.channels.OverlappingFileLockException
at nextflow.conda.CondaCache.getCachePathFor(CondaCache.groovy:395)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.processor.TaskRun.getCondaEnv0(TaskRun.groovy:647)
at jdk.internal.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnCurrentN(ScriptBytecodeAdapter.java:96)
at nextflow.processor.TaskRun.this$dist$invoke$1(TaskRun.groovy)
at nextflow.processor.TaskRun$2.methodMissing(TaskRun.groovy)
at jdk.internal.reflect.GeneratedMethodAccessor157.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
at groovy.lang.MetaClassImpl.invokeMissingMethod(MetaClassImpl.java:924)
at groovy.lang.MetaClassImpl.invokePropertyOrMissing(MetaClassImpl.java:1413)
at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1335)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.processor.TaskRun$2.apply(TaskRun.groovy:638)
at nextflow.processor.TaskRun$2.apply(TaskRun.groovy)
at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.processor.TaskRun.getCondaEnv(TaskRun.groovy:635)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.processor.TaskProcessor.createTaskHashKey(TaskProcessor.groovy:2224)
at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:654)
at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:52)
at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.nio.channels.OverlappingFileLockException: null
at java.base/sun.nio.ch.FileLockTable.checkList(FileLockTable.java:229)
at java.base/sun.nio.ch.FileLockTable.add(FileLockTable.java:123)
at java.base/sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:1321)
at java.base/java.nio.channels.FileChannel.tryLock(FileChannel.java:1191)
at nextflow.file.FileMutex.lock(FileMutex.groovy:90)
at nextflow.conda.CondaCache.createLocalCondaEnv(CondaCache.groovy:267)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
at groovy.lang.MetaClassImpl.doInvokeMethod(MetaClassImpl.java:1333)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1088)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:645)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:628)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:82)
at nextflow.conda.CondaCache$_getLazyImagePath_closure3.doCall(CondaCache.groovy:372)
at nextflow.conda.CondaCache$_getLazyImagePath_closure3.call(CondaCache.groovy)
at groovyx.gpars.dataflow.LazyDataflowVariable$1.run(LazyDataflowVariable.java:70)
Note it happens less if I re-run repeatedly (Without resume). So it (subjectively) appears to only happen more often on the first run in a given directory.
Makes sense if it's due to Nextflow concurrently creating new Conda environments. Re-runs will use existing envs from previous runs.
So far I've done 5 runs (of taxprofiler) in Gitpod, sometimes with fresh Gitpod, and other times doing rm -rf .nextflow* work results
, and haven't got it yet. It's not an easy one to reproduce.
I have the impression it could be a race condition between two processes using the same conda file.
Based on my observations with charliecloud, I think this error will occur when two processes that use the same environment try to create it simultaneously and somehow both manage to create the FileMutex()
file. I am obviously not an expert, but with charliecloud I had 0 success in avoiding this (or implementing any other locking mechanism), I am mainly surprised this only surfaced now in conda and wonder why this does not happen with e.g. singularity, which uses exactly the same code in the cache.
See discussion of this issue here: https://github.com/nextflow-io/nextflow/issues/3367 and here https://github.com/nextflow-io/nextflow/pull/4712#issuecomment-1934009475
I think the synchronized in this part of code should include the whole method https://github.com/nextflow-io/nextflow/blob/e025a846ecc471c42277ac17e800a4d59c15611b/modules/nextflow/src/main/groovy/nextflow/conda/CondaCache.groovy#L364-L374
Indeed, likely the sync "key" needs to be reviewed
I was wrong, the issue is not the sync. It is happening when the path in the conda directive is the same but it is passed as a different string. I mean "/nf-core/taxprofiler/workflows/../subworkflows/local/../../modules/nf-core/fastqc/environment.yml" is the same as "/nf-core/taxprofiler/workflows/modules/nf-core/fastqc/environment.yml" but as strings are different. I think it is happening with these three processes: 'NFCORE_TAXPROFILER:TAXPROFILER:FASTQC' , 'NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_PREPROCESSING:FASTQC_PROCESSED' and 'NFCORE_TAXPROFILER:TAXPROFILER:LONGREAD_PREPROCESSING:FASTQC_PROCESSED'.
Then using path.toAbsolutePath().normalize()
should solve
Bug report
I am running nf-core/taxprofiler on gitpod with conda.
When I run the basic test profile:
nextflow run nf-core/taxprofiler -profile test,conda --outdir results/
, I get the OverlappingFileLockException error again.This is despite using the latest patch version of Nextflow, as reported at the top of the console on execution
Expected behavior and actual behavior
Nextflow can build multiple conda environments simulteanously without any overlapping lock files
Steps to reproduce the problem
Open nf-core/taxprofiler repo on gitpod.io
Run
Program output
Environment
Additional context
(Add any other context about the problem here)