nextflow-io / nf-prov

Apache License 2.0
23 stars 11 forks source link

Improve WrrocRenderer #33

Open fbartusch opened 2 months ago

fbartusch commented 2 months ago

Add workflow file to crate. Correct path to output files in metadata file. Add CreateActions for tool executions. Add ControlActions. Add SoftwareApplications for processes. Add HowToSteps. Add agent information (orcid, name) to nextflow.config that is incorporated into metadata file.

simleo commented 2 months ago

I've installed the plugin with make install and used the following nextflow.config:

plugins {
    id 'nf-prov@1.1.0'
}

params {
    outdir = 'results'
}

prov {
    enabled = true
    formats {
        wrroc {
            file = "${params.outdir}/ro-crate-metadata.json"
            overwrite = true
            agent {
                name = "John Doe"
                orcid = "https://orcid.org/0000-0000-0000-0000"
            }
        }
    }
}

I had to pin the plugin's version because otherwise Nextflow downloads nf-prov-1.2.2 and uses that (which fails because it does not support wrroc as a format). Then I ran nextflow run tests/test.nf and got a results RO-Crate. Here are the problems I've found:

"result": [
    [
        {
            "@id": "r1.foo.1.txt"
        },
        {
            "@id": "r1.foo.2.txt"
        }
    ]
],

should be:

"result": [
    {
        "@id": "r1.foo.1.txt"
    },
    {
        "@id": "r1.foo.2.txt"
    }
],

I made these changes manually to move on with the review. At this point, Runcrate reads the crate (runcrate report results) without crashing. Here's the output:

action: #606ce08c-cac1-4777-8a1b-83f497a0c63d
  instrument: test.nf (['File', 'SoftwareSourceCode', 'ComputationalWorkflow', 'HowTo'])
  started: 2024-06-20T11:59:14.123266278+02:00
  ended: 2024-06-20T11:59:14.897242324+02:00
  outputs:
    r1.foo.1.txt
    r1.foo.2.txt
    r2.foo.2.txt
    r2.foo.1.txt
    r3.foo.1.txt
    r3.foo.2.txt

action: #71e33a99d3115bb57a7cdfc1b220fbbf
  step: test.nf#main/RNG
  instrument: #Script_923b8caac4a54f39@66e17eff (SoftwareApplication)
  outputs:
    r1.foo.1.txt
    r1.foo.2.txt

action: #a9b6314416b001fb728ff45fa29d6f5e
  step: test.nf#main/RNG
  instrument: #Script_923b8caac4a54f39@66e17eff (SoftwareApplication)
  outputs:
    r2.foo.1.txt
    r2.foo.2.txt

action: #45b19d72e22ba819cf02b88e4977bc86
  step: test.nf#main/RNG
  instrument: #Script_923b8caac4a54f39@66e17eff (SoftwareApplication)
  outputs:
    r3.foo.1.txt
    r3.foo.2.txt

I've also noticed a weird thing: if you run rm results -rf && nextflow run tests/test.nf (without changing the value of outdir in nextflow.config) repeatedly, sometimes (like 1 in 20 times) the ro-crate-metadata.json file is missing from the results dir.

fbartusch commented 1 month ago

Thanks for the comprehensive feedback @simleo . The next time I will check if runcrate can parse the document before asking for feedback ...

I fixed most of the issues, but this causes some headache:

sometimes (like 1 in 20 times) the ro-crate-metadata.json file is missing from the results dir.

I can reproduce the problem. I wrote a script that removes the results directory, reruns the pipeline and checks if ro-crate-metadata.json ist present. It seems that the workflowOutputs mapping is missing entries for some output files.

Next I will use the newest Nextflow version and maybe the newest nf-prov version, maybe this problem is addressed there. For me it doesn't seem that the problem originates from the WrrocRenderer itself.

simleo commented 4 weeks ago

I've run the workflow again with the plugin at ac05da2, using the same nextflow.config. The issues I've reported in my previous comment seem to be solved (the sometimes-missing metadata file problem should be solved after the integration of c92b21f, if I understand correctly). Great job! This is another round of review.

{
    "@id": "#outdir",
    "@type": "FormalParameter",
    "name": "outdir",
    ...
},
{
    "@id": "#constant",
    "@type": "FormalParameter",
    "name": "constant",
    ...
},
{
    "@id": "#b5236af9-95e9-42b8-9121-a6a57755ee1b",
    "@type": "CreateAction",
    "instrument": {
        "@id": "test.nf"
    },
    "object": [
        {"@id": "#outdir-pv"},
        {"@id": "#constant-pv"}
    ],
    "result": [
        {"@id": "r3.bar.2.txt"},
        {"@id": "r3.bar.1.txt"},
        {"@id": "r2.bar.1.txt"},
        {"@id": "r2.bar.2.txt"},
        {"@id": "r1.bar.1.txt"},
        {"@id": "r1.bar.2.txt"}
    ]
    ...
},
{
    "@id": "#outdir-pv",
    "@type": "PropertyValue",
    "exampleOfWork": {"@id": "#outdir"},
    "name": "outdir",
    "value": "crate"
},
{
    "@id": "#constant-pv",
    "@type": "PropertyValue",
    "exampleOfWork": {"@id": "#constant"},
    "name": "constant",
    "value": "bar"
}

When run with nextflow run tests/test.nf the outcome should be the same except for the "value" fields of the PropertyValue entities, which should be "results" and "foo" (the defaults). Note that the PropertyValue instances link to the corresponding FormalParameters via exampleOfWork.

fbartusch commented 3 weeks ago

First, thank you for the review @simleo.

bentsherman commented 3 weeks ago

session.config should contain the resolved configuration. it also contains the resolved params as session.config.params