haesleinhuepf / napari-workflows

BSD 3-Clause "New" or "Revised" License
12 stars 10 forks source link

Have the user define outputs (& maybe also inputs) #27

Open jluethi opened 2 years ago

jluethi commented 2 years ago

Started using the napari-workflows, they are awesome! πŸ‘πŸ»πŸš€ I came up with a feature wish that would make them even nicer, posting this here as a discussion starter.

I've been setting up a set of napari-workflows for different image processing and feature measurements to be run headless afterwards (see here for some examples: https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/31). To run a napari-workflow on large datasets, one needs to map inputs correctly and define what things should be saved as an output.

Which things are inputs could be inferred from the dask graph by checking which input is not an output of any other task. These inputs would then need to be set before the workflow can be run. Would be a great feature (e.g. for a next version of the workflow yaml file) if there is an "input" flag or such, so that it is already clear what is an input from the workflow object.

Outputs are a bit trickier. We could run the same inference to figure out which output is never an input of another step. But see my examples above or https://github.com/haesleinhuepf/napari-workflows/issues/11: Sometimes users want to save multiple outputs, e.g. save a label image and measurements of the label image. Or save an intermediate result as well. In that case, calling a, b = workflow.get(["output", "output_2"]) returns all requested outputs from a single run. But how do we know which things the users wants to save (/how would I specify this when creating a workflow)? Here again, it would be great if there was some kind of flag that the user can set in their workflow to define what should be the outputs of the workflow that are saved.

haesleinhuepf commented 2 years ago

Hi Joel @jluethi,

fantastic feedback! And a great reminder that I need to write more documentation.

Would be a great feature (e.g. for a next version of the workflow yaml file) if there is an "input" flag or such, so that it is already clear what is an input from the workflow object.

You can ask a workflow object for all undefined images using workflow.roots(). The graph is a non-cyclic directional graph (we go from inputs to outputs) and thus, it's pretty much a tree -> roots()

Outputs are a bit trickier. We could run the same inference to figure out which output is never an input of another step.

Correct! You can use workflow.leafs() to find the other end(s) of the tree.

Sometimes users want to save multiple outputs, e.g. save a label image and measurements of the label image. Or save an intermediate result as well.

leafs() works with multiple outputs. See this notebook (I just wrote).

If you want to have a specific intermediate result as output, you could define a nop() function that marks an output as a leaf.

def nop(in):
    return in

If you add this function to the workflow in the notebook linked above, its result output2 should show up in the leafs():

workflow.set("output2", nop, "denoised")

Let me know if this works! And if you have a good idea how we phrase that conceptually in the documentation, I'm also happy to hear your thoughts in this context!

Thanks again for the feedback! :-)

Best, Robert

jluethi commented 2 years ago

Hey Robert,

Sorry for the delayed feedback & thanks a lot for the explanations, that's quite fascinating! The infrastructure for it all seems to be ready in that case :)

August is a bit slow on our side due to holidays. But we're looking to integrate a first napari workflow into Fractal in late August and then add broader support for varying input/outputs by the end of September.

Once we have a few different workflows running, this questions of defining inputs & outputs will get more interesting and I will start exploring the roots & leafs parts to see whether they'll help us with the parsing of workflows & matching them up to the zarr inputs & outputs :)

Also, thanks a lot for the notebook, always surprised at how simple the API is for handling different parts of napari-workflows! πŸ‘πŸ»