kg-construct / rml-lv

Specification repository for logical views in RML.
https://kg-construct.github.io/rml-lv/dev.html
3 stars 3 forks source link

Access to Web APIs requiring parameters #24

Open tirrolo opened 7 months ago

tirrolo commented 7 months ago

Context

Provided that LV allow us to express constraints inherited from relational databases theory (at least, in the form of annotations), and that these are useful in those scenarios where access to data is constrained by binding patterns [1], being web-apis a notable example, we were wondering whether it would be worth adding an annotation signaling that certain fields need to be provided as arguments to the underlying API. In other words, fields could allow for the definition of "parametric" mappings. Pagination is a particular case of this.

[1] Michael Benedikt, Julien Leblay, Balder ten Cate, Efthymia Tsamoura: Generating Plans from Proofs: The Interpolation-based Approach to Query Reformulation. Synthesis Lectures on Data Management, Morgan & Claypool Publishers 2016, ISBN 978-3-031-00728-6

Example scenario

Example mapping:

Headers:

@prefix rml: <http://w3id.org/rml/> .
@prefix td: <https://www.w3.org/2019/wot/td#> .
@prefix htv: <http://www.w3.org/2011/http#> .
@prefix hctl: <https://www.w3.org/2019/wot/hypermedia#> .
@prefix csvw: <http://www.w3.org/ns/csvw> .
@prefix hydra: <http://www.w3.org/ns/hydra/core#> .
@prefix ex: <http://www.example.com/> .
@base <http://example.com/ns#> .

First source (non-parametric):

# RML logical source and logical view for a CSV file listing academic staff:
#
#  ID;NAME;SURNAME;POSITION;EMAIL
#  113541;Alice;Doe;teaching staff;alice.doe@rmluniversity.edu
#  ...

<#CSVLogicalSource> a rml:LogicalSource;
  rml:source [ a rml:Source, csvw:Table;
    csvw:url "file:///path/to/list_of_professors.csv";
    csvw:dialect [ a csvw:Dialect;
      csvw:delimiter ";";
      csvw:encoding "UTF-8";
      csvw:header "1"^^xsd:boolean
    ]
  ];
  rml:referenceFormulation rml:CSV.

<#CSVLogicalView> a rml:LogicalView;
  rml:onLogicalSource <#CSVLogicalSource>;
  rml:field [
    rml:fieldName "id" ;
    rml:reference "ID";
  ].
 rml:structuralAnnotation [
    a rml:PrimaryKeyAnnotation; <#CSVLogicalView>
    rml:onFields ("id")
 ].

RML logical source and logical view for an API looking up courses taught by a given lecturer in the university DB:

# example request: https://api.rmluniversity.edu/courses?lecturer=113541
# example response:
# {
#   "courses": [{
#     "code": "CS1234",
#     "name": "Introduction to Databases",
#     "lecturer_id": 113541
#   }, {
#     "code": "CS1237",
#     "name": "Conceptual Modeling",
#     "lecturer_id": 113541
#   }]
# }
#

<#APILogicalSource> a rml:LogicalSource;
  rml:source [ a rml:Source, td:Thing;
    td:hasPropertyAffordance [
      td:hasUriTemplateSchema "https://api.rmluniversity.edu/courses?lecturer={lecturer_id}";  # need parameter lecturer_id, should state this formally!
      td:hasForm [ a hctl:Form;  # hctl:Form = hydra:Operation, hence here we can also put <#APIHydraSpecGetCourseOperation> (see later)
        hctl:forContentType "application/json";
        htv:methodName "GET";
        htv:headers ([
          htv:fieldName "Accept";
          htv:fieldValue "application/json"
        ])
      ]
    ]
  ];
  rml:referenceFormulation rml:JSONPath;
  rml:iterator "$.courses[*]".

The logical source above, can produce results only if values of lecturer_id are provided. But how can the RML processor know where to find these values? Our proposal, similar in spirit to [1], is to exploit inclusions stated as structural annotations within logical views. See logical view below:

<#APILogicalView> a rml:LogicalView;
  rml:onLogicalSource <#APILogicalSource>;
  rml:field [
    rml:fieldName "code";
    rml:reference "$.code"
  ];
  rml:field [
    rml:fieldName "name";
    rml:reference "$.name"
  ];
  rml:field [
    rml:fieldName "lecturer_id";
    rml:reference "$.lecturer_id"
  ];
  rml:structuralAnnotation [ 
    a rml:ForeignKeyAnnotation;  # This states that all 'lecturer_id' values here occurs in field 'id' of <#CSVLogicalView>
    rml:onFields ("lecturer_id");
    rml:targetView <#CSVLogicalView>;
    rml:targetFields ("id")
  ].

Note the rml:ForeignKeyAnnotation stating that all values of lecturer_id are also id in the CSV. The RML processor, thus, can devise a plan to populate the graph: extracting all the id values from the CSV, and then feeding them to the web API.

We complete the example with expression maps using the logical views above.

#
# RML mappings instantiating courses with their name and lecturer.
#

<#Course> a rml:TriplesMap;
  rml:logicalSource <#APILogicalView>;
  rml:subjectMap [
    rml:template "http://kg.rmluniversity.edu/course/{code}";
    rml:class ex:Course
  ];
  rml:predicateObjectMap [
    rml:predicate ex:name;
    rml:objectMap [
      rml:reference "name";
      rml:datatype xsd:string
    ]
  ];
  rml:predicateObjectMap [
    rml:predicate ex:lecturer;
    rml:objectMap [
      rml:parentTriplesMap <#CourseLecturer>
    ]
  ].

<#CourseLecturer> a rml:TriplesMap;
  rml:logicalSource <#APILogicalView>;
  rml:subjectMap [
    rml:template "http://kg.rmluniversity.edu/professor/{lecturer_id}";
    rml:class ex:Lecturer
  ].

Variant: Using Hydra

In the example above, we have used a notation with curly braces to denote a parameter for the API (following the mechanism provided by the td:hasUriTemplateSchema property). Probably this could be done more explicitly, for instance, by using Hydra:

#
# EXTRA: possible (partial) definition of API operation and IRI template using Hydra
#

<#APIHydraSpecCourseIriTemplate> a hydra:IriTemplate;
  hydra:template "https://api.rmluniversity.edu/courses?lecturer={lecturer_id}";
  hydra:mapping [ 
    a hydra:IriTemplateMapping;
    hydra:variableRepresentation hydra:BasicRepresentation;
    hydra:variable "lecturer_id";
    hydra:property "lecturer_id"; # here we want to formally map variable {lecturer_id} to field "lecturer_id" and/or reference "$.lecturer_id"
    hydra:required true;
  ];
  hydra:operation <#APIHydraSpecGetCourseOperation>.

<#APIHydraSpecGetCourseOperation> a hydra:Operation;
  hydra:method "GET";
  hydra:returns: hydra:Collection.