Closed fbartusch closed 4 months ago
I think I naively assumed that there would not be a race condition, but that is likely what is happening since files can be published concurrently. I will update nf-prov to handle concurrent publish events correctly
If you want, you can test my theory making either making the onFilePublish
method synchronized
or using a ConcurrentHashMap
for the workflowOutputs
Making the methods synchronized solved the issue. I needed to add @Synchronized
also to onProcessComplete
and onProcessCached
, because sometimes tasks were also missing in the task list given to the plugin.
I am writing a similar plugin, and came across this problem. I was able to solve it by using Collections.synchronized*()
objects. I.e.
private Set<TaskRun> tasks = Collections.synchronizedList([])
private Map<Path,Path> workflowOutputs = Collections.synchronizedMap([:])
instead of
private Set<TaskRun> tasks = []
private Map<Path,Path> workflowOutputs = [:]
This has solved the issue as far as my testing can tell. I'm curious though if you guys think that some of the other solutions mentioned here are preferable for some reason? Or if they are all pretty much equivalent.
They should all work -- you could synchronize the event handlers, the data structures, use a ConcurrentHashMap
, or synchronize with a lock like I did here in nf-boost.
The ConcurrentHashMap
is probably the most performant because it should impose less synchronization overhead. Might make a difference for very large pipelines but can't say for sure without testing it.
I will probably just use a lock for nf-prov, that's what we usually do because it gives you better control over the synchronization. Also there appears to be an issue with virtual threads and synchronized methods which makes me wary of using synchronized for now.
Should be fixed by c92b21f, will be included in the next release
Feel free to re-open if you see the issue again
Bug report
I'm opening the issue here and not in the
nf-prov
repo, because I don't know if the underlying problem is in thenextflow
or in thenf-prov
codebase (see Additional context for more info).Expected behavior and actual behavior
Using the
nf-prov
plugin, published files are sometimes missing from theMap<Path,Path> workflowOutputs
mapping (used here and therefore inbco.json
.Steps to reproduce the problem
Use the
nf-prov
pluing and itstest.nf
workflow.nextflow.config
:The problem does not occur every time, it happens ~10% of the time. I'm using a loop and copy the
bco.json
files every time.faulty
files are a bit smaller in file size than correct files.Program output
Here is an excerpt of a problematic
bco.json
file (five output files missing):This is how it should look like:
Environment
Additional context
The underlying problem is, that sometimes published files are missing in the
workflowOutputs
mapping. This mapping is populated byProvObserver
in thenf-prov
repo, but I think a race condition in thenextflow
code could be the problem. MaybeonFlowComplete()
is called before the all published files are properly handled by the code inPublishDir.groovy
and therefore added to the`workflowOutputs
mapping.