Recognize permutations of inputs/outputs

nsbgn commented 1 year ago

It happens sometimes that multiple signatures are created that are the same, just with the order of the inputs shuffled. The same happens for supertools, and even inside the actions of supertools. Consider:

supertool:SelectLayerByLocationDistTessObject a :Supertool ;
    :action
        [ :apply arc:export-features.htm ;
            :inputs ( _:d3 ) ;
            :outputs ( _:d2 ) ],
        [ :apply arc:select-layer-by-location.htm ;
            :inputs ( _:d1 _:d0 ) ;
            :outputs ( _:d3 ) ] ;
    :inputs ( _:d0 _:d1 ) ;
    :outputs ( _:d2 ) .

supertool:SelectLayerByLocationTessObject a :Supertool ;
    :action
        [ :apply arc:select-layer-by-location.htm ;
            :inputs ( _:d0 _:d1 ) ;
            :outputs ( _:d3 ) ],
        [ :apply arc:export-features.htm ;
            :inputs ( _:d3 ) ;
            :outputs ( _:d2 ) ] ;
    :inputs ( _:d0 _:d1 ) ;
    :outputs ( _:d2 ) .

While the annotator could be made responsible for being consistent with order, the point of drawing from manual annotations was to avoid such mistakes --- so it should really be automatically recognized.

Should we take into account input order for concrete tools? We could drop the list structure there. That would instantly remove extraneous supertools, but not extraneous signatures.

nsbgn commented 1 year ago

The ordering on action inputs/outputs can probably be dropped. However, be careful: a supertool with :inputs (_:d1 _:d2) will become isomorphic to a supertool with :inputs ( _:d2 _:d1 ), so this might still lead to mixups.

nsbgn commented 1 year ago

We need to think harder about where order is relevant and where it is not. The situation right now is as follows:

Signatures: These have an ordered list in tool:inputs with their CCD signature.
Concrete tools: These don't have information about their inputs at all.
Workflows: These only have unordered wf:source predicates.
Workflow actions: These have ordered wf:inputN predicates.
Supertools: These have an ordered list in tool:inputs.
Supertool actions: These have an ordered list in tool:inputs

The ordering of tools is used in these ways:

When an abstract workflow action applies a supertool, we'd like to be able to reconstruct the concrete workflow from the contents of the supertool. For this, we must know which data inside the workflow fits into what slots of the supertool. This is not essential, because we can instead just include the entire concrete workflow with every abstract workflow.
When associating a workflow action with a signature, it may be helpful to know which of the signature's CCD types correspond to which inputs/outputs of the workflow action to make it easier to disambiguate signatures. On the other hand, not having an ordering makes it easier to disregard unimportant differences.
The one hard requirement is that we know which inputs in workflow actions correspond to which parts of the CCT expression of a signature. We need this to be able to construct transformation graphs.

Therefore, the only place where the numbering is really essential is in workflow actions (where the action can be associated with a signature). Using labels instead of lists or numbered predicates would afford us some flexibility:

Labels can be more descriptive than numbers, reducing the chance that inputs are accidentally swapped.
If we later add something, the error messages will be clear --- rather than requiring you to align lists everywhere.
Sometimes, inputs are left out --- because they are just parameters or because they are otherwise inessential. A concrete tool in one context may have a different number of inputs than the same tool in another context. Are we consistent in what inputs are left out in the concrete workflows? If not, using labels means that we can label the inputs that are relevant for the signature and leave the rest unlabelled.
Since a supertool is just a schematic workflow, we'd like to avoid using different schemas to describe them --- but if we use ordering, that means we're either stuck recording order where it doesn't matter, or dropping the symmetry.
In general, labels make it clear where order matters and where it does not. This way, we're never confused whether the N in wf:inputN was a conscious choice or not.
It becomes easier to generate error messages about input/output artefacts when they have been given an :id, compared to when they are just the index number in some list.
It also becomes easier to recognize when a supertool is just a variation of another supertool with the inputs flipped, and to reorder the labels in that case.

The downside is that it's a little more verbose. For illustration, consider:

Was:

wf:_1 a wf:Workflow;
    wf:source _:d1, _:d2;
    wf:target _:d3;
    wf:edge [
        wf:applicationOf signature:_1;
        wf:input1 _:d1;
        wf:input2 _:d2;
        wf:output _:d3
    ].

supertool:_1 a :Supertool;
    :inputs ( _:d1 _:d2 );
    :outputs ( _:d3 );
    :action [
        :apply supertool:_1;
        :inputs ( _:d1 _:d2 );
        :outputs ( _:d3 )
    ].

signature:_1 a :Signature;
    :inputs ( [ a ccd:Type ] [ a ccd:Type ] );
    :outputs ( [ a ccd:Type ] );
    cct:expression "f 1 2";
    :implementation supertool:_1.

Becomes:

wf:_1 a :Workflow;
    :input _:d1, _:d2;
    :output _:d3;
    :action [
        :apply signature:_1;
        :input [ :id "1"; :as _:d1 ]
        :input [ :id "2"; :as _:d2 ]
        :output _:d3
    ].

supertool:_1 a :Supertool;
    :input [ :id "1"; :as _:d1 ];
    :input [ :id "2"; :as _:d2 ];
    :output _:d3;
    :action [
        :apply supertool:_1;
        :input _:d1, _:d2;
        :output _:d3
    ].

signature:_1 a :Signature;
    :input [ :id "1"; a ccd:Type ];
    :input [ :id "2"; a ccd:Type ];
    :output [ a ccd:Type ];
    cct:expression "f 1 2";
    :implementation supertool:_1.

This would partially reinstate the changes reverted in https://github.com/quangis/quangis-workflow/commit/96bb7fe730d8a4a281bb9184057c910dd6db73a8.

An alternative would be to use :inputs when order is relevant and :input when it's not. :inputs ( x ... ) would of course imply :input x. This has some of the benefits of using :ids, but not all.

wf:_1 a :Workflow;
    :input _:d1, _:d2;
    :output _:d3;
    :action [
        :apply signature:_1;
        :inputs ( _:d1 _:d2 );
        :output _:d3
    ].

supertool:_1 a :Supertool;
    :input ( _:d1 _:d2 );
    :output _:d3;
    :action [
        :apply supertool:_1;
        :input _:d1, _:d2;
        :output _:d3
    ].

signature:_1 a :Signature;
    :input ( [ a ccd:Type ] [ a ccd:Type ] );
    :output [ a ccd:Type ];
    cct:expression "f 1 2";
    :implementation supertool:_1.

nsbgn commented 1 year ago

A distinction between ConcreteArtefacts (for Workflows) and SchematicArtefacts (for Supertools and Signatures) makes this easier to work with. Combine this with an explicit Label object for ConcreteActions.

wf:_1 a :Workflow;
    :source _:d0, _:d1;
    :action [
        :apply signature:_1;
        :input _:d0;
        :output _:d1;
        :label [ :id "1", :for _:d0 ]
    ];
    :action [
        :apply signature:_2;
        :input _:d1, _:d2;
        :output _:d3;
        :label [ :id "1"; :for _:d1 ],
            [ :id "2"; :for _:d2 ]
    ].

supertool:_1 a :Supertool;
    :action [
        :apply tool:_1;
        :input [ :id "1" ];
        :output _:d3
    ], [
        :apply tool:_2;
        :input _:d3, [ :id "2" ];
        :output [ rdfs:label "final output" ]
    ].

signature:_1 a :Signature;
    :input [ :id "1"; a ccd:Type ];
    :input [ :id "2"; a ccd:Type ];
    :output [ a ccd:Type ];
    cct:expression "f 1 2";
    :implementation supertool:_1.

nsbgn commented 1 year ago

Note also that the original supertools labelled its action inputs (which is unnecessary) but did not label its own inputs, which is necessary. This may lead to wrong identification of supertools later on.

nsbgn commented 1 year ago

In #11, permuted subsuming CCD signatures are now detected after-the-fact, but not yet while adding new tools, which is what this issue is about.

nsbgn commented 1 year ago

We don't just need to know whether there is a permutation that matches, but also which one matches.

quangis / quangis-workflow

Recognize permutations of inputs/outputs #8