Open bskubi opened 4 days ago
See the warning at the end of this section: https://nextflow.io/docs/latest/script.html#closures
Unfortunately Nextflow will allow you to declare variables without def
but in the case of functions it will be a global (i.e. script-scope) variable. This doesn't happen for processes and workflows because they are scoped to the process/workflow instead.
I'm not certain but I have a feeling you are declaring a variable in your function without def
and it is causing a race condition
That's correct, I wasn't using def inside of the function. That explanation makes sense. Thank you!
Bug report
This is a very peculiar bug involving an apparent breakdown of scope when the same Groovy method is called by two subworkflows in parallel. The smallest example I've been able to produce is unfortunately still 291 lines long, but I have isolated the problematic lines and have an example that reproducibly demonstrates the error.
I've done my best to explain the problem clearly, but I understand that 291 lines of code is a lot for a 'minimal' reproducible example, so I'm happy to answer clarifying questions.
Expected behavior and actual behavior
I have a Groovy method, sqljoin, called from two separate places:
Hence, we have: RefsToFile -> sqljoin MakeMissingChromsizes -> MakeResourceFile -> sqljoin
For illustrating this example, there are two key arguments to sqljoin:
by = "sample_id"
by = [assembly]
The expected behavior of the call to sqljoin from "RefsToFile" is observed if the call from "MakeResourceFile" is commented out. It is also correct up through line 99 of the example code. However, on line 100 and 102, two problems arise when the call from "MakeResourceFile" is not commented out (so that sqljoin is called from both places).
This appears to represent a breakdown of scope between the two calls to sqljoin.
This is the first time I've managed to isolate this issue with 100% reproducibility and pinpoint the line at which the problem occurs, but I have observed similar errors in the past that all involved calling a custom Groovy method repeatedly in parallel. For example, in the past, I had a custom Groovy method that was called from within the shell section of a process where calls would appear to have the same variable value-swapping problem. The problem was eliminated by moving the code contained within the Groovy method directly into the shell section.
Steps to reproduce the problem
Extract and run the following code with
nextflow run hich.nf
error.zipThis will illustrate the erroneous output. Then comment out the call to MakeResourceFile and rerun to show how this yields different results from the preceding call to sqljoin from the RefsToFile subworkflow.
Program output
(skubi) benjamin@Odysseus:~/Documents/error$ nextflow run hich.nf
Erroneous result illustrating mixup of variable values when call to sqljoin from MakeResourceFile is active
Expected result when call to sqljoin from MakeResourceFile is commented out.
Environment
Additional context
The aim of the sqljoin method is to implement left, right and inner joins on channels that contain a single LinkedHashMap item. I am aware that Nextflow provides operators providing similar features, but this function uses them to implement additional new functionality not provided by Nextflow's basic operators.