kg-construct / rml-core

RML-Core: Main features for RDF generation with RML
https://w3id.org/rml/core/spec
Creative Commons Attribution 4.0 International
12 stars 9 forks source link

Defining window operations in RML #85

Open s-minoo opened 1 year ago

s-minoo commented 1 year ago

Issue

Currently, there is no way to define windowing semantics in RML. Windowing is crucial when evaluating joins between different live streaming data sources.

Furthermore, windowing could also support buffering capabilities for aggregation functions when processing streaming data sources. For example, calculating an average of the values over the last 5 minutes.

Requirements

According to Gedik B., windows' behaviour is defined based on its type, and policies.

There are 2 main types of windows: tumbling, and sliding windows. An illustration about how these windows work can be found here. Note: Session window is a special case of tumbling window where the window only gets dropped when inactivity threshold is violated.

The policies control when the windows evicts the tuples inside the window (eviction policy), and when they triggers the processing of the tuples using the operator logic defined inside the window (trigger policy).

Policies are further divided into 4 categories namely: 1) Count-based

Thus, we need a set of vocabulary to define and configure windows by describing:

1) Window Type 2) Eviction policy 3) Trigger policy

The true semantics and combination of the policies are further explained by Gedik B..

Example

Given the following RML with a join condition:

<#TM1> 
    rml:logicalSource <#STREAM1> ;
    rml:subjectMap <#SM1> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM1> ;
        rml:objectMap [ 
            rml:parentTriplesMap <#SM2>; 
            rr:joinCondition [
                rr:child "id";
                rr:parent "p_id"; 
            ];

        ];

    ]. 

<#TM2> 
    rml:logicalSource <#STREAM2> ;
    rml:subjectMap <#SM2> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM2> ;
        rml:objectMap <#OM2> ] .

Windows could be defined in the object map

<#TM1> 
    rml:logicalSource <#STREAM1> ;
    rml:subjectMap <#SM1> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM1> ;
        rml:objectMap [
            # Define the window to be used for joining
            rml:window [ 
                # Define window types 
                rml:windowType rml:Tumbling; 

                # Define the trigger policy for the window 
                # Every 5th record will execute the join
                rml:trigger [ a rml:CountPolicy
                    rml:countValue  5;

                ]; 

                # Define the eviction policy for the window
                # Clean up window after processing the 15th record
                rml:evict [ a rml:CountPolicy;
                    rml:countValue  15;
                ];

            ];
            rml:parentTriplesMap <#SM2>; 
            rr:joinCondition [
                rr:child "id";
                rr:parent "p_id"; 
            ];
        ];
    ]. 

<#TM2> 
    rml:logicalSource <#STREAM2> ;
    rml:subjectMap <#SM2> ;
    rml:predicateObjectMap [
        rml:predicateMap <#PM2> ;
        rml:objectMap <#OM2> ] .
dachafra commented 1 year ago

@s-minoo is this a specific request for join conditions? If this is the case, please confirm me so I can move it to the proper repository

s-minoo commented 1 year ago

It is indeed a specific request for joins. So, I think it's more relevant to rml-join repo.

dachafra commented 1 year ago

Transfer to its corresponding repository then

elsdvlee commented 9 months ago

The rml-jc repo will be closed. Moving this unsolved issue back to rml-core.

dachafra commented 3 months ago

I would suggest to leave this issue for the working-group