Open bentsherman opened 10 months ago
I've built the plugin and ran:
nextflow run tests/test.nf
with the following (top level) nextflow.config:
plugins {
id 'nf-prov'
}
params {
outdir = 'results'
}
prov {
enabled = true
formats {
wrroc {
file = "${params.outdir}/ro-crate-metadata.json"
overwrite = true
}
}
}
And got this ro-crate-metadata.json. For a first commit it's looking pretty good already! Runcrate reads the resulting ro-crate (runcrate report results
) and does not break. However, there are several issues. Here is a list of what I've found:
test.nf
workflow is listed in the metadata but it's not in the crate directory (results
).work/e1/80ed247039cd71794ba71091aedf2b/r1.foo.2.txt
while it should be simply r1.foo.2.txt
since that's the relative path to the crate dir.CreateAction
corresponding to the workflow run, additional CreateAction
s corresponding to individual tool execution. In the case of the workflow that I ran, there is only one process (RNG
). Assuming we can consider processes as the tools orchestrated by the workflow, there should be a SoftwareApplication
to represent RNG
, which would be referenced from the workflow's hasPart
(currently empty). There should also be a HowToStep
instance corresponding to the RNG step. The additional CreateAction
instances should be three, since the tool is executed with r1
, r2
and r3
as the values for the prefix
.outdir
as a formal parameter, but the only parameter you can actually set when launching the workflow is constant
.author1
is listed as an agent, but I guess it's actually the workflow author that's being read from the other nextflow.config
(the one in the same directory as the workflow). The agent of an action should represent whoever executed the action. BTW, the @id
should be #author1
, since it's a contextual entity internal to the crate.I know next to nothing about Nextflow, but my impression is that the outputs are copied to the results
directory because of the line:
publishDir "results", mode: 'copy'
However, to export as RO-Crate, the relevant files (input, output, workflow, ...) should always be inside the crate's directory tree. This should not depend on the specific workflow, so the plugin needs to take care of this.
Hi, let us know if you would like some help looking at this.
Thank you guys for your feedback. It's exactly what I needed to make sure I'm going in the right direction
I thought I was going to get to this sooner which is why I didn't respond at the time, but that never happened. Sorry for the radio silence
I've been too busy with other priorities to put any time into this, so this will likely not move until I get some free time or someone else picks it up. If you know anyone who would like to work on it, I would be happy to work with them
I believe there is also a parallel effort to implement the workflow run crate in the nf-core tooling, might be worth checking in on them
See also https://github.com/nf-core/tools/pull/2680
@fbartusch may be able to have a look at this
I worked on most issues @simleo pointed out. @bentsherman How can I add my changes to this pull request? Can you somehow give me the permission to commit my changes to the PR?
@fbartusch I suggest that you fork the repo, push a new branch with your changes, then you should be able to make a PR for it.
@fbartusch I suggest that you fork the repo, push a new branch with your changes, then you should be able to make a PR for it.
@bentsherman I created a PR: https://github.com/nextflow-io/nf-prov/pull/33 @simleo: I think i fixed most of the issues you mentioned here: https://github.com/nextflow-io/nf-prov/pull/19#issuecomment-1775410835). Can you check if information for a valid Workflow Run RO-crate is still missing?
@fbartusch I've posted my comment on #33
Close #6
cc @stain @simleo
Happy to receive any feedback, it's far from complete but wanted to share what I have so far.