korpling / annatto

Converts linguistic data formats based on the graphANNIS data model as intermediate representation and can apply consistency tests.
Apache License 2.0
1 stars 0 forks source link

TOML representation of workflow configuration #102

Closed MartinKl closed 1 year ago

MartinKl commented 1 year ago

Right now, all attributes of a module are listed in the same map as the module's listing in the workflow. This is an easy and compact way of representing a workflow, see here:

[workflow]

[[import]]
path = "tests/data/import/textgrid/singleSpeaker/"
format = "textgrid"
audio_extension = "wav"

[[export]]
path = "tests/test_out"
format = "graphml"

Nevertheless, this format is also misleading: It's not directly clear from the representation (which it also does not necessarily need to be, because people should read the docs, but it is still an obstacle) that some of the keys are attributes of the module struct and whether they are available or not depends on what has been chosen as format for import/export or action for graph_op, i. e. the module.

An alternative would be to single the attributes out into a config table, that would be more transparent:

[workflow]

[[import]]
path = "tests/data/import/textgrid/singleSpeaker/"
format = "textgrid"

[import.config]
audio_extension = "wav"

[[export]]
path = "tests/test_out"
format = "graphml"

[export.config]

This structure is more transparent, but also a bit more verbose. Also, I right now don't find a way (I'll keep checking) to omit empty config tables, they seem to be mandatory for now.

Any thoughts @thomaskrause ?

MartinKl commented 1 year ago

The problem of empty config tables can currently not be solved. There seems to be a pending pull request from October: https://github.com/serde-rs/serde/pull/2295

Once this is merged, there'd be a solution for all modules who implement Default[^1]. We then would have to enrich the enum values that contain modules with default with #[serde(default)], e. g.:

pub enum WriteAs {
    GraphML(#[serde(default)] GraphMLExporter), // the purpose of serde(default) here is, that an empty `[export.config]` table can be omitted
}

[^1]: Reminder: Not all modules do that anymore to force users to configure attributes that cannot be filled by default, such as path variables to external configurations.

MartinKl commented 1 year ago

A third alternative would be the following:

[workflow]

[[import]]
path = "tests/data/import/textgrid/singleSpeaker/"

[import.config]
format = "textgrid"
audio_extension = "wav"

[[graph_op]]

[graph_op.config]
action = "check"
query_file_path = "some/path/to/some/file"

[[export]]
path = "tests/test_out"

[export.config]
format = "graphml"

I don't like that particularly much, since it would lead to no attributes for graph_op (at least practically, theoretically it's possible to override workflow_directory, but that might be removed in the future anyway) and makes the representation more verbose by default, even for very simple workflows and modules with no attributes. Also, the original problem of having format next to the format-depending attributes will not be solved this way.

Felt like mentioning it anyway.