Kick-off meeting

Desired outcomes

Guide for implementing workflows, specifying, e.g., if you want to do X, use approach Y.

Must be able to return event data (required for polarization analysis)
Multiple banks, multiple files, chunking (file-based + stream-based)
Split handling (loading) of monitors (events), detectors (events), and remainder
Structure for masking any dim or transformed dim, in various steps
Naming conventions
Package and module structure (where to place types? what does where?)
How to extract meta data (avoiding keeping large data alive)
Write unit tests for providers, not (exclusively) for entire workflows
Require passing mypy for workflows (would include making it part of CI)?
How to handle optional steps
How to handle optional inputs
Should we have default params set in workflows?
How to save output files
How to handle provenance
Loading from SciCat vs. local files
Performance guidelines (how to avoid pitfalls around event data, or large temporary dims)
How to define parameters, such that we can, e.g., auto generate widgets for user input (names, description, limits, default values, ...)
Docstrings: Include math, references, ...

Docstrings: Include math, references, ...
Must be able to return event data (required for polarization analysis)
Write unit tests for providers, not (exclusively) for entire workflows
Naming conventions (and type conventions (example: filenames)?)
Package and module structure (where to place types? what goes where?)
- Requires minor discussion
Loading from SciCat vs. local files (e.g., define run ID, choose provider that either converts to local path, or uses service to get file and return path)
Split handling (loading) of monitors (events), detectors (events), and remainder
How to extract meta data (avoiding keeping large data alive)
- Concern about large data resolved by loading event data (monitors and detectors) separately from the rest
Do not write files (or to services) in providers.
Performance guidelines (how to avoid/detect pitfalls around event data, or large temporary dims)
- Every workflow should be tested with large data and checked for memory consumption and performance bottlenecks
Add logical dims when loading NeXus files.
Should we have default params set in workflows?
- Avoid unless good reason.
- Can have widgets that generate dict of params and values, widgets can have defaults

How to define parameters, such that we can, e.g., auto generate widgets for user input (names, description, limits, default values, ...)
- Range checks / validators
- If part of pipeline then UX and writing providers is more cumbersome
- Default values?
Requires experimentation with how Sciline handles param tables, and transformations of task graphs
- Multiple banks, multiple files, chunking (file-based + stream-based)
- How to handle optional steps
- Structure for masking any dim or transformed dim, in various steps
- Could be handled as a task-graph transform?
How to handle optional inputs
- Can we find a way to minimize the occasions where we need this?
- Can we avoid mutually exclusive parameters?

Create draft for known items
- Justify each guideline
- Later: Consider adding bad examples and how to fix, if applicable
Experiment and gather more info for others