Closed SimonHeybrock closed 1 year ago
A list of potential requirements for workflows. This is just a collection of everything we can think of, to serve as a starting point, not a list of actual requirements. This must be curated:
sc.transform_coords
supports via sc.show_graph
.Found out that orsopy can store some logs and this is used in parts by the Amor reduction. In general, we need to see how to integrate this kind of functionality with other packages.
The logs that orsopy
takes are specific to the expectations of the ORSO file format. And essentially at the moment it is hardcoded in the Amor reduction workflow (i.e. if step x is run, y is added to the ORSO data object). I suspect there is a more elegant solution to achieving this though.
Incomplete, but this was the result of yesterday's discussion:
Colors are:
Another practical challenge is how to define input parameters and how to channel them to the currect workflow step. Consider for example Mantid's DgsReduction. It has tens of input parameters that are required at various steps of the reduction: (truncated)
Maybe a useful way of considering this is by looking at Blender's node editor:
Our workflows are similar to those pipelines. Many parameters are needed at various stages. Many of them have default values. Others are optional. In the graphical node editor it is very obvious to the user what is used where, and they can provide the values directly. Our current approach (Mantid or the existing workflows in this repository) attempt to "flatten" this representation, into a single big list of input parameters, that has to rely on naming, descriptions, and comments to obtain meaning. Is there a better way? Should we at the very least preserve structure in the parameter map? But even that does not feel intuitive.
A naive way of encoding params that includes the location where they are used is a dict from workflow step name to parameter. This would work for simple workflows but not for complex ones where some steps appear multiple times, e.g. there are multiple monitor-normalisations.
I think the only really clear way is to use a structure that directly mirrors the workflow. E.g. a graph or simply a Jupyter notebook where params are passed to the functions directly. But the only way to make this really apparent and easy to understand would be a graphical tool as in Blender. Doing this in code (Python, YAML, ...) will always have to flatten the structure to some extend because code is inherently linear.
I think the only really clear way is to use a structure that directly mirrors the workflow.
Yes, that is my current thinking as well. We'd have some sort of nested structure of dataclasses (or similar) with parameters.
This still has a problem. Such a structure is tree-like. So if a parameter is shared between multiple steps (e.g. number of wavelength bins), that parameter need to be duplicated. And when changing it, the user has to change it in all places.
If we want to use something like dataclasses, this could be solved to having a shared mutable structure in the dataclasses. But that can be messy.
What I was more referring to was an actual graph that allows for (undirected) cycles. E.g. the number of wavelength bins could itself be a vertex which all interested steps depend on. (Like the two 'Value' boxes in your Blender example.
What I was more referring to was an actual graph that allows for (undirected) cycles. E.g. the number of wavelength bins could itself be a vertex which all interested steps depend on. (Like the two 'Value' boxes in your Blender example.
Isn't that equivalent to the flat list that, e.g., the Mantid workflows use, with all the problems that come without having structure?
How so?
How so?
Can you give a code example, of how a user would set parameters in this graph?
Like I said, code is linear. So no matter what structure we use, it will always be flat (except for a potential tree-structure like YAML / nested dicts).
I_of_q = make_workflow()
n_bins = find_wavelength_bin_param(I_of_q) # needs to be implemented by searching through the graph
n_bins.value = 100
result = I_of_q.compute()
So as I said above: Your "graph" suggestion is the "same" as the flat structure used by Mantid workflows. That is why I, too, suggest using nested structures of parameters (dataclasses, or as you mention plain nested dicts), to allow avoid that.
I disagree. The graph is not flat but nested and potentially even cycles. The access is flat because it is done in Python.
But we should discuss this in person instead of with hand wavy arguments here.
I disagree. The graph is not flat but nested and potentially even cycles. The access is flat because it is done in Python.
The users usually do not see the graph when setting parameters. From this point of view it is irrelevant what structure is used underneath.
We are now at a point where Sciline has all the basics. We can thus go back to the requirements table and see which boxed we can tick off. For what is left, evaluate whether it should extend Sciline, or whether it will go elsewhere.
Write design document, leading to implementation of Sciline.
Overview
There is a significant number of cross-technique and cross-workflow requirements. We need to avoid developing too many flavors of the same thing, as it will complicate understanding the code, testing, and maintenance. A template or design to follow would provide guidance for development of new workflow, and will avoid overhead from having to make the same decisions over and over again.
Examples:
Notes
scipp.DataGroup
will be available soon. It may play a central role in this.