polifonia-project / sonar2021_demo

This repository is created for the documentation of the Polifonia demo that is going to be presented to SONAR2021
https://polifonia-project.github.io/sonar2021_demo/
2 stars 0 forks source link

Data transformation: Raw JSON data —> Polifonia KG Mapping Rules #35

Closed delfimpandiani closed 2 years ago

delfimpandiani commented 2 years ago

Use [OUTPUT1] (Example Places KG) to automatically create Places KG.

Definition of the RML mapping rules. Use the Example to define and test mapping rules for all classes and properties (check whether the mapping rules are producing the correct output.) Needs to include labels for all the classes and properties. Output of this step [OUTPUT2] should an automatically created KG that includes all the raw data connecting recordings to places following the Polifonia Ontology modules.

INPUT: https://github.com/polifonia-project/sonar2021_demo/blob/datasets/places/places.json (JSON from Andrea)

INPUT: OUTPUT1 (RDF from Fiorela)

OUTPUT2: Polifonia KG (Turtle from Delfina)

delfimpandiani commented 2 years ago

example_to_match.ttl

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix mp: <https://w3id.org/polifonia/ON/musical-performance/> .
@prefix core: <https://w3id.org/polifonia/ON/core/> .

<https://w3id.org/polifonia/resource/Recording/01> a mp:Recording ;
     mp:hasRecordingPerformer <https://w3id.org/polifonia/resource/Agent/the_beatles> ;
     core:hasTitle <https://w3id.org/polifonia/resource/Title/i_saw_her_standing_there> ;
     mp:hasSession <https://w3id.org/polifonia/resource/Session/session_01_1> , <https://w3id.org/polifonia/resource/Session/session_01_2> , <https://w3id.org/polifonia/resource/Session/session_01_3> ;
     mp:hasYoutubeID "http://youtube.com/mwBdWVTR-o8" .

<https://w3id.org/polifonia/resource/Agent/the_beatles> a core:Agent ;
     rdfs:label "The Beatles" ;
     core:hasBirthPlace <https://w3id.org/polifonia/resource/Place/great_britain> ;
     mp:hasMusicalActivityBeginPlace <https://w3id.org/polifonia/resource/Place/liverpool> .

<https://w3id.org/polifonia/resource/Title/i_saw_her_standing_there> a core:Title ;
     rdfs:label "I Saw Her Standing There" .

<https://w3id.org/polifonia/resource/Session/session_01_1> a mp:Session ;
     core:hasType <https://w3id.org/polifonia/resource/SessionType/edited_at> ;
     core:startTime "1963-02-25" ;
     core:endTime "1963-02-25" ;
     core:hasPlace <https://w3id.org/polifonia/resource/PhysicalPlace/physicalplace_1> .

<https://w3id.org/polifonia/resource/Session/session_01_2> a mp:Session ;
     core:hasType <https://w3id.org/polifonia/resource/SessionType/mixed_at> ;
     core:startTime "1963-02-25" ;
     core:endTime "1963-02-25" ;
     core:hasPlace <https://w3id.org/polifonia/resource/PhysicalPlace/physicalplace_1> .

<https://w3id.org/polifonia/resource/Session/session_01_3> a mp:Session ;
     core:hasType <https://w3id.org/polifonia/resource/SessionType/recorded_at> ;
     core:startTime "1963-02-11" ;
     core:endTime "1963-02-11" ;
     core:hasPlace <https://w3id.org/polifonia/resource/PhysicalPlace/physicalplace_1> .

<https://w3id.org/polifonia/resource/SessionType/edited_at> a mp:SessionType ;
     rdfs:label "edited at" .

<https://w3id.org/polifonia/resource/SessionType/mixed_at> a mp:SessionType ;
     rdfs:label "mixed at" .

<https://w3id.org/polifonia/resource/SessionType/recorded_at> a mp:SessionType ;
     rdfs:label "recorded at" .

<https://w3id.org/polifonia/resource/PhysicalPlace/physicalplace_1> a core:PhysicalPlace ;
     core:hasAddress "3 Abbey Road, St John\u2019s Wood, London"^^xsd:string ;
     core:hasCoordinate <https://w3id.org/polifonia/resource/Coordinate/coordinate_1> .

<https://w3id.org/polifonia/resource/Place/great_britain> a core:Place ;
     rdfs:label "GB" .

<https://w3id.org/polifonia/resource/Place/liverpool> a core:Place ;
     rdfs:label "Liverpool" .

<https://w3id.org/polifonia/resource/Coordinate/coordinate_1> a core:Coordinate;
     core:lat "51.53192" ;
     core:long "-0.17835" .
delfimpandiani commented 2 years ago

polifonia_kg_mapping_rules.ttl


@prefix : <https://w3id.org/polifonia/ON/core> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> `.`
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix xs: <http://www.w3.org/2001/XMLSchema#> .
@prefix cpv: <https://w3id.org/italia/onto/CPV/> .
@prefix mp: <https://w3id.org/polifonia/ON/musical-performance/> .
@prefix core: <https://w3id.org/polifonia/ON/core/> .

:RecordingTriplesMap 
    rml:logicalSource :RecordingSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/Recording/recording_{track_id}";
        rr:class mp:Recording
    ] ;

    rr:predicateObjectMap [
        rr:predicate mp:hasRecordingPerformer;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/Agent/{artist}" 
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:hasTitle;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/Title/{title}" 
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate mp:hasSession;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/Session/session_{track_id}_{recording_places[*].type-id}" 
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate mp:hasYoutubeID;
        rr:objectMap [
            rr:template "http://youtube.com/{youtube_id}"
        ]
    ] .

:AgentTriplesMap 
    rml:logicalSource :AgentSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/Agent/{artist}";
        rr:class core:Agent
    ] ;

    rr:predicateObjectMap [
        rr:predicate rdfs:label;
        rr:objectMap [
            rml:reference "artist";
            rr:datatype xs:string;
        ]
    ] .

:TitleTriplesMap 
    rml:logicalSource :TitleSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/Title/{title}";
        rr:class core:Title
    ] ;

    rr:predicateObjectMap [
        rr:predicate rdfs:label;
        rr:objectMap [
            rml:reference "title";
            rr:datatype xs:string;
        ]
    ] .

:SessionTriplesMap 
    rml:logicalSource :SessionSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/Session/{ref_id}_{type}";
        rr:class mp:Session
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:hasType;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/SessionType/{type-id}" 
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:startTime;
        rr:objectMap [
            rml:reference "begin"
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:endTime;
        rr:objectMap [
            rml:reference "end"
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:hasPlace;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_{place.id}" 
        ]
    ] .

:SessionTypeTriplesMap 
    rml:logicalSource :SessionTypeSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/SessionType/{type-id}";
        rr:class mp:SessionType
    ] ;

    rr:predicateObjectMap [
        rr:predicate rdfs:label;
        rr:objectMap [
            rml:reference "type";
            rr:datatype xs:string;
        ]
    ] .

:PhysicalPlacesTriplesMap 
    rml:logicalSource :PhysicalPlaceSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_{id}";
        rr:class core:PhysicalPlace
    ] ;

    rr:predicateObjectMap [
        rr:predicate rdfs:label;
        rr:objectMap [
            rml:reference "name";
            rr:datatype xs:string;
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:hasAddress;
        rr:objectMap [
            rml:reference "address";
            rr:datatype xs:string;
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:hasCoordinate;
        rr:objectMap [
            rr:template "https://w3id.org/polifonia/resource/Coordinate/coordinate_{id}" 
        ]
    ] .

:CoordinateTriplesMap 
    rml:logicalSource :CoordinateSource ;
    rr:subjectMap [
        rr:template "https://w3id.org/polifonia/resource/Coordinate/coordinate__{id}";
        rr:class core:Coordinate
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:lat;
        rr:objectMap [
            rml:reference "coordinates.latitude";
            rr:datatype xs:decimal;
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate core:long;
        rr:objectMap [
            rml:reference "coordinates.longitude";
            rr:datatype xs:decimal;
        ]
    ] .

:RecordingSource rml:source "./sources/places_input.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*]" .

:AgentSource rml:source "./sources/places_input2.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*]" .

:TitleSource rml:source "./sources/places_input3.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*]" .

:SessionSource rml:source "./sources/places_input4.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*].recording_places[*]" .

:SessionTypeSource rml:source "./sources/places_input5.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*].recording_places[*]" .

:PhysicalPlaceSource rml:source "./sources/places_input6.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*].recording_places[*].place" .

:CoordinateSource rml:source "./sources/places_input7.json";
    rml:referenceFormulation ql:JSONPath ;
    rml:iterator "$.tracks[*].recording_places[*].place" .
delfimpandiani commented 2 years ago

Sonar Data Mapping

Data Transformation Raw JSON -> Polifonia RDF

There is a related Github repo containing the files for this task. In that repository, there is PyRML (Python based engine for processing RML files developed by A. Nuzzolese) and all my progress so far. It includes:

Quite some progress has been made. However, there are still work to do, including:

valecarriero commented 2 years ago

https://w3id.org/polifonia/resource/SessionType/069c1cf0_a9b2_448e_8486_1eced48b48f9 a mp:SessionType ; rdfs:label "edited at"^^xsd:string .

https://w3id.org/polifonia/resource/SessionType/11d74801_1493_4a5d_bc0f_4ddc537acddb a mp:SessionType ; rdfs:label "mixed at"^^xsd:string .

https://w3id.org/polifonia/resource/SessionType/6bc3827d_bc20_4621_ae14_9c3707ad140a a mp:SessionType ; rdfs:label "produced at"^^xsd:string .

https://w3id.org/polifonia/resource/SessionType/ad462279_14b0_4180_9b58_571d0eef7c51 a mp:SessionType ; rdfs:label "recorded at"^^xsd:string .

https://w3id.org/polifonia/resource/SessionType/f845a95e_b2b5_4a94_9645_fc8b031ab0bd a mp:SessionType ; rdfs:label "engineered at"^^xsd:string .

--> I can create 5 Named Individuals in the ontology that correspond to these session types, so the mapping could be

if "edited at" --> mp:EditingSession if "mixed at" --> mp:MixdownSession if "produced at" --> mp:ProductionSession if "recorded at" --> mp:RecordingSession if "engineered at" --> mp:EngineeringSession

I can't guarantee these will be the final names, but for now I think this would be okay. Let me know if this is feasible, for sure it's not the most urgent thing to do.

delfimpandiani commented 2 years ago

--> I can create 5 Named Individuals in the ontology that correspond to these session types, so the mapping could be

if "edited at" --> mp:EditingSession if "mixed at" --> mp:MixdownSession if "produced at" --> mp:ProductionSession if "recorded at" --> mp:RecordingSession if "engineered at" --> mp:EngineeringSession

@valecarriero that sounds like a good idea, would it then be something like this?:

<https://w3id.org/polifonia/resource/Session/00001_edited_at> a mp:Session ;
    core:endTime "1963-02-25" ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c56fdea4-e81e-439a-a183-a52eb1141409" ;
    core:hasType <https://w3id.org/polifonia/ON/musical-performance/EditingSession> ;
    core:startTime "1963-02-25" .

<https://w3id.org/polifonia/ON/musical-performance/EditingSession> a mp:SessionType ;
rdfs:label "edited at"^^xsd:string .
valecarriero commented 2 years ago

--> I can create 5 Named Individuals in the ontology that correspond to these session types, so the mapping could be if "edited at" --> mp:EditingSession if "mixed at" --> mp:MixdownSession if "produced at" --> mp:ProductionSession if "recorded at" --> mp:RecordingSession if "engineered at" --> mp:EngineeringSession

@valecarriero that sounds like a good idea, would it then be something like this?:

<https://w3id.org/polifonia/resource/Session/00001_edited_at> a mp:Session ;
    core:endTime "1963-02-25" ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c56fdea4-e81e-439a-a183-a52eb1141409" ;
    core:hasType <https://w3id.org/polifonia/ON/musical-performance/EditingSession> ;
    core:startTime "1963-02-25" .

<https://w3id.org/polifonia/ON/musical-performance/EditingSession> a mp:SessionType ;
rdfs:label "edited at"^^xsd:string .

yes, but you don't need the last two triples, because they will be already in the ontology!

valecarriero commented 2 years ago
<https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
    core:endTime "1958-10-30" ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
    core:hasType "https://w3id.org/polifonia/resource/SessionType/ad462279-14b0-4180-9b58-571d0eef7c51" ;
    core:startTime "1958-10-30" .

The mapping of the start and end time is not correct. Based on the model, we should have this:

 <https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
 core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods" ;  
 core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
    core:hasType <https://w3id.org/polifonia/ON/musical-performance/EditingSession> .

<https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods> core:startTime "1958-10-30" ;  core:endTime "1958-10-30" .

Let me know if it's clear

delfimpandiani commented 2 years ago

A part of the data has been converted from the raw JSON into RDF (.ttl) [https://github.com/polifonia-project/sonar2021_demo/blob/datasets/sonar_mapping_delfina/polifonia_KG.ttl] using PyRML. There are still some issues with some of the IRIs, some of the properties, etc. After the WP2 meeting, we, the OD Task Force, had another meeting to discuss this specific issue and how to make it work.

We decided that most of the issues would be resolved by preprocessing the raw data.

From Monday, @enridaga will give us (the OD team) a thorough tutorial of SPARQLAnything starting from real raw Polifonia data, and start a study group for us to become fluent in it. SPARQLAnything will become the official tool for the Polifonia KG construction.

valecarriero commented 2 years ago

core:PhysicalPlace --> core:PhysicalSite

valecarriero commented 2 years ago
<https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalPlace ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ;
    core:hasCoordinate "https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939" .

should become

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalSite ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress <https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> ;
core:hasGeometry <https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939>
     .
<https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> rdf:type core:Address ; rdfs:label "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ; core:fullAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string .

<https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939> core:lat "123" ; core:long "456" .

In the future, we will need to split the address into its components, but for now I think it's okay.

valecarriero commented 2 years ago

I think there is something wrong with the generation of the URIs of the coordinates, because I can't find e.g.https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939 related to its long and lat

<https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalPlace ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ;
    core:hasCoordinate "https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939" .
jonnybluesman commented 2 years ago
<https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalPlace ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ;
    core:hasCoordinate "https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939" .

should become

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalSite ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress <https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> ;
core:hasGeometry <https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939>
     .
<https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> rdf:type core:Address ; rdfs:label "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ; core:fullAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string .

<https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939> core:lat "123" ; core:long "456" .

In the future, we will need to split the address into its components, but for now I think it's okay.

Not sure if it would be okay for now, because if we want to find connections based on the place, then "118 Church Rd, London, ..." would be different from "117 Church Rd, London, ...", and even the same for 2 addresses in the same city. Does it make sense?

valecarriero commented 2 years ago
<https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalPlace ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ;
    core:hasCoordinate "https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939" .

should become

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalSite ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress <https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> ;
core:hasGeometry <https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939>
     .
<https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> rdf:type core:Address ; rdfs:label "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ; core:fullAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string .

<https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939> core:lat "123" ; core:long "456" .

In the future, we will need to split the address into its components, but for now I think it's okay.

Not sure if it would be okay for now, because if we want to find connections based on the place, then "118 Church Rd, London, ..." would be different from "117 Church Rd, London, ...", and even the same for 2 addresses in the same city. Does it make sense?

It totally makes sense, but we need to find a way to split the address string

jonnybluesman commented 2 years ago
<https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalPlace ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ;
    core:hasCoordinate "https://w3id.org/polifonia/resource/Coordinate/coordinate_044df534-4170-45eb-8b42-805102bb2939" .

should become

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_044df534_4170_45eb_8b42_805102bb2939> a core:PhysicalSite ;
    rdfs:label "Olympic Studios"^^xsd:string ;
    core:hasAddress <https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> ;
core:hasGeometry <https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939>
     .
<https://w3id.org/polifonia/resource/Address/fdsjfdsifjods> rdf:type core:Address ; rdfs:label "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string ; core:fullAddress "117 Church Rd, London SW13 9HL, United Kingdom"^^xsd:string .

<https://w3id.org/polifonia/resource/Geometry/geometry_044df534-4170-45eb-8b42-805102bb2939> core:lat "123" ; core:long "456" .

In the future, we will need to split the address into its components, but for now I think it's okay.

Not sure if it would be okay for now, because if we want to find connections based on the place, then "118 Church Rd, London, ..." would be different from "117 Church Rd, London, ...", and even the same for 2 addresses in the same city. Does it make sense?

It totally makes sense, but we need to find a way to split the address string

Okay then we can post-process the JSONs and split the full address based on the commas. The order of the components should be the same -- at least for data from MB (which is our focus now, as far as I understood).

delfimpandiani commented 2 years ago
<https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
   core:endTime "1958-10-30" ;
   core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
   core:hasType "https://w3id.org/polifonia/resource/SessionType/ad462279-14b0-4180-9b58-571d0eef7c51" ;
   core:startTime "1958-10-30" .

The mapping of the start and end time is not correct. Based on the model, we should have this:

 <https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
 core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods" ;  
 core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
    core:hasType <https://w3id.org/polifonia/ON/musical-performance/EditingSession> .

<https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods> core:startTime "1958-10-30" ;  core:endTime "1958-10-30" .

Let me know if it's clear

<https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
   core:endTime "1958-10-30" ;
   core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
   core:hasType "https://w3id.org/polifonia/resource/SessionType/ad462279-14b0-4180-9b58-571d0eef7c51" ;
   core:startTime "1958-10-30" .

The mapping of the start and end time is not correct. Based on the model, we should have this:

 <https://w3id.org/polifonia/resource/Session/00718_recorded_at> a mp:Session ;
 core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods" ;  
 core:hasPlace "https://w3id.org/polifonia/resource/PhysicalPlace/physical_place_c7227463-931e-406a-8138-fa7e90fdbab8" ;
    core:hasType <https://w3id.org/polifonia/ON/musical-performance/EditingSession> .

<https://w3id.org/polifonia/resource/TimeInterval/dsjijfiodjfiods> core:startTime "1958-10-30" ;  core:endTime "1958-10-30" .

Let me know if it's clear

got it, I fixed it.

delfimpandiani commented 2 years ago

Ok, some updates relevant for @jonnybluesman, @andreamust and @valecarriero.

I edited the mapping rules and tested them against an example to make sure everything works well. It does. For it, a couple more things need to be preprocessed in the original JSON. Basically, the original JSON needs to look exactly like this:

{
    "tracks": [
        {
            "track_id": "00001",
            "artist1": "Louis Armstrong",
            "artist1_for_iri": "louis_armstrong",
            "artist2": "Ella Fitzgerald",
            "artist2_for_iri": "ella_fitzgerald",
            "title": "I Saw Her Standing There",
            "title_for_iri": "i_saw_her_standing_there",
            "recording_places": [
                {
                    "session_id": "00001_1",
                    "type": "edited at",
                    "session_type" : "EditingSession",
                    "type-id": "069c1cf0_a9b2_448e_8486_1eced48b48f9",
                    "direction": "backward",
                    "begin": "1963-02-25",
                    "end": "1963-02-25",
                    "ended": "true",
                    "place": {
                        "id": "c56fdea4_e81e_439a_a183_a52eb1141409",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 1",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    },
                    "ref_id": "00001"
                },
                {
                    "session_id": "00001_2",
                    "type": "mixed at",
                    "session_type" : "MixingSession",
                    "type-id": "11d74801_1493_4a5d_bc0f_4ddc537acddb",
                    "direction": "backward",
                    "begin": "1963-02-25",
                    "end": "1963-02-25",
                    "ended": "true",
                    "place": {
                        "id": "c56fdea4_e81e_439a_a183_a52eb1141409",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 1",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    },
                    "ref_id": "00001"
                },
                {
                    "session_id": "00001_3",
                    "type": "recorded at",
                    "session_type" : "RecordingSession",
                    "type-id": "ad462279_14b0_4180_9b58_571d0eef7c51",
                    "direction": "backward",
                    "begin": "1963-02-11",
                    "end": "1963-02-11",
                    "ended": "true",
                    "place": {
                        "id": "6f12a5d2_52e5_4dec_9fed_494b1f65bb94",
                        "type": "Studio",
                        "name": "Abbey Road Studios: Studio 2",
                        "address": "3 Abbey Road, St John\u2019s Wood, London",
                        "coordinates": {
                            "latitude": "51.53192",
                            "longitude": "-0.17835"
                        }
                    },
                    "ref_id": "00001"
                }
            ],
            "artist_country": "GB",
            "artist_start": "Liverpool",
            "youtube_id": "mwBdWVTR-o8"
        }
    ]
}

Note that, apart from the edits we discussed today, two major changes were made:

  1. for each of the "recording_places", add a key "session_type" with value:

    • if "edited at" --> mp:EditingSession
    • if "mixed at" --> mp:MixdownSession
    • if "produced at" --> mp:ProductionSession
    • if "recorded at" --> mp:RecordingSession
    • if "engineered at" --> mp:EngineeringSession
  2. for recordingplaces' "type-id" and for place's "id" -- change the dash (-) for an underscore ()

    • ^ This is important!!!!

From this example file, the mapping rules now lead to the following Turtle. It looks completely correct:

@prefix core: <https://w3id.org/polifonia/ON/core/> .
@prefix mp: <https://w3id.org/polifonia/ON/musical-performance/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

mp:editingsession a mp:SessionType ;
    rdfs:label "edited at"^^xsd:string .

mp:mixingsession a mp:SessionType ;
    rdfs:label "mixed at"^^xsd:string .

mp:recordingsession a mp:SessionType ;
    rdfs:label "recorded at"^^xsd:string .

<https://w3id.org/polifonia/resource/Address/address_6f12a5d2_52e5_4dec_9fed_494b1f65bb94> a core:Address ;
    rdfs:haslabel "3 Abbey Road, St John’s Wood, London"^^xsd:string ;
    core:fullAddress "3 Abbey Road, St John’s Wood, London"^^xsd:string .

<https://w3id.org/polifonia/resource/Address/address_c56fdea4_e81e_439a_a183_a52eb1141409> a core:Address ;
    rdfs:haslabel "3 Abbey Road, St John’s Wood, London"^^xsd:string ;
    core:fullAddress "3 Abbey Road, St John’s Wood, London"^^xsd:string .

<https://w3id.org/polifonia/resource/Agent/ella_fitzgerald> a core:Agent ;
    rdfs:label "Ella Fitzgerald"^^xsd:string .

<https://w3id.org/polifonia/resource/Agent/louis_armstrong> a core:Agent ;
    rdfs:label "Louis Armstrong"^^xsd:string .

<https://w3id.org/polifonia/resource/Geometry/geometry_6f12a5d2_52e5_4dec_9fed_494b1f65bb94> a core:Geometry ;
    core:lat 51.53192 ;
    core:long -0.17835 .

<https://w3id.org/polifonia/resource/Geometry/geometry_c56fdea4_e81e_439a_a183_a52eb1141409> a core:Geometry ;
    core:lat 51.53192 ;
    core:long -0.17835 .

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_6f12a5d2_52e5_4dec_9fed_494b1f65bb94> a core:PhysicalSite ;
    rdfs:label "Abbey Road Studios: Studio 2"^^xsd:string ;
    core:hasAddress "https://w3id.org/polifonia/resource/Address/address_6f12a5d2_52e5_4dec_9fed_494b1f65bb94" ;
    core:hasGeometry "https://w3id.org/polifonia/resource/Geometry/geometry_6f12a5d2_52e5_4dec_9fed_494b1f65bb94" .

<https://w3id.org/polifonia/resource/PhysicalSite/physical_site_c56fdea4_e81e_439a_a183_a52eb1141409> a core:PhysicalSite ;
    rdfs:label "Abbey Road Studios: Studio 1"^^xsd:string ;
    core:hasAddress "https://w3id.org/polifonia/resource/Address/address_c56fdea4_e81e_439a_a183_a52eb1141409" ;
    core:hasGeometry "https://w3id.org/polifonia/resource/Geometry/geometry_c56fdea4_e81e_439a_a183_a52eb1141409" .

<https://w3id.org/polifonia/resource/Recording/recording_00001> a mp:Recording ;
    core:hasTitle "https://w3id.org/polifonia/resource/Title/i_saw_her_standing_there" ;
    mp:hasRecordingPerformer "https://w3id.org/polifonia/resource/Agent/ella_fitzgerald",
        "https://w3id.org/polifonia/resource/Agent/louis_armstrong" ;
    mp:hasYoutubeID "mwBdWVTR-o8"^^xsd:string .

<https://w3id.org/polifonia/resource/Session/00001_1> a mp:Session ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalSite/physical_site_c56fdea4_e81e_439a_a183_a52eb1141409" ;
    core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/ti_00001_1" ;
    core:hasType "https://w3id.org/polifonia/ON/musical-performance/EditingSession" ;
    mp:isSessionOfRecording "https://w3id.org/polifonia/resource/Recording/recording_00001" .

<https://w3id.org/polifonia/resource/Session/00001_2> a mp:Session ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalSite/physical_site_c56fdea4_e81e_439a_a183_a52eb1141409" ;
    core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/ti_00001_2" ;
    core:hasType "https://w3id.org/polifonia/ON/musical-performance/MixingSession" ;
    mp:isSessionOfRecording "https://w3id.org/polifonia/resource/Recording/recording_00001" .

<https://w3id.org/polifonia/resource/Session/00001_3> a mp:Session ;
    core:hasPlace "https://w3id.org/polifonia/resource/PhysicalSite/physical_site_6f12a5d2_52e5_4dec_9fed_494b1f65bb94" ;
    core:hasTimeInterval "https://w3id.org/polifonia/resource/TimeInterval/ti_00001_3" ;
    core:hasType "https://w3id.org/polifonia/ON/musical-performance/RecordingSession" ;
    mp:isSessionOfRecording "https://w3id.org/polifonia/resource/Recording/recording_00001" .

<https://w3id.org/polifonia/resource/TimeInterval/ti_00001_1> a core:TimeInterval ;
    core:endTime "1963-02-25" ;
    core:startTime "1963-02-25" .

<https://w3id.org/polifonia/resource/TimeInterval/ti_00001_2> a core:TimeInterval ;
    core:endTime "1963-02-25" ;
    core:startTime "1963-02-25" .

<https://w3id.org/polifonia/resource/TimeInterval/ti_00001_3> a core:TimeInterval ;
    core:endTime "1963-02-11" ;
    core:startTime "1963-02-11" .

<https://w3id.org/polifonia/resource/Title/i_saw_her_standing_there> a core:Title ;
    rdfs:label "I Saw Her Standing There"^^xsd:string .

@jonnybluesman can you create a file with a few examples following this input example's exact structure and syntax? @andreamust can you preprocess the whole raw JSON to look exactly like the example in this comment? @valecarriero can you please check the turtle RDF output?

once @jonnybluesman gives me the three examples, I will create the few examples RDF and pass it to the interface team. I will disconnect from this issue until tommorrow.

delfimpandiani commented 2 years ago

The relevant data (sonar_mapping_rules) has been updated.

delfimpandiani commented 2 years ago

Good news for @ccolonna @JaseMK and @phivk!

We have a first correct (albeit very small) Polifonia KG (in Turtle format, and in JSON-LD format): it contains 3 tracks with sessions in London and by 3 different artists. Thank you @jonnybluesman!

Even though it is small, you should be able to do your all your work (Polifonia KG --> Interface Input Data) based on this KG, as this is what the data will look like for now.

@andreamust and I are working on populating the KG with the rest of all of the data, and it will probably be done within the next 24hs.

ccolonna commented 2 years ago

Many thanks @delfimpandiani , is there an ontology diagram somewhere?

delfimpandiani commented 2 years ago

Hi @ccolonna, you can find the Places ontology module here!

jonnybluesman commented 2 years ago

Hi @ccolonna, you can find the Places ontology module here!

However, @ccolonna bear in mind that for this Turtle example we are using the full address as a single string (a simplification for time being), whereas our ontology makes distinction among all the different components of the address.

ccolonna commented 2 years ago

Thanks @delfimpandiani and @jonnybluesman .

Is this ok ?

<https://w3id.org/polifonia/resource/Recording/recording_00509> a mp:Recording ;
    core:hasTitle "https://w3id.org/polifonia/resource/Title/gerald_moore" ;
    mp:hasRecordingPerformer "https://w3id.org/polifonia/resource/Agent/dietrich_fischer_dieskau" .

Shouldn't it be:

<https://w3id.org/polifonia/resource/Recording/recording_00509> a mp:Recording ;
    core:hasTitle <https://w3id.org/polifonia/resource/Title/gerald_moore> ;
    mp:hasRecordingPerformer <https://w3id.org/polifonia/resource/Agent/dietrich_fischer_dieskau> .

I tried this query and it returned [] :

SELECT 
    ?uri
    ?titleURI
    ?titleLabel
    ?artistURI
WHERE   {

    ?uri    a mp:Recording ;
            core:hasTitle ?titleURI .
            mp:hasRecordingPerformer ?artistURI .

    ?titleURI   a core:Title;
                rdfs:label ?titleLabel .

}

while this returns correct results:

SELECT 
    ?uri
    ?titleURI
    ?artistURI
WHERE   {

    ?uri    a mp:Recording ;
            core:hasTitle ?titleURI .
            mp:hasRecordingPerformer ?artistURI .
}
andreamust commented 2 years ago

@delfimpandiani I updated the original JSON file with the changes that you requested. You can find the file in the places folder Could you please check if everything's ok?

delfimpandiani commented 2 years ago

@delfimpandiani I updated the original JSON file with the changes that you requested. You can find the file in the places folder Could you please check if everything's ok?

Looks great except for the place id, I need it to have underscores instead of dashes so, for example, instead of c56fdea4-e81e-439a-a183-a52eb1141409 it should be c56fdea4_e81e_439a_a183_a52eb114140

delfimpandiani commented 2 years ago

Thanks @delfimpandiani and @jonnybluesman .

Is this ok ?

<https://w3id.org/polifonia/resource/Recording/recording_00509> a mp:Recording ;
    core:hasTitle "https://w3id.org/polifonia/resource/Title/gerald_moore" ;
    mp:hasRecordingPerformer "https://w3id.org/polifonia/resource/Agent/dietrich_fischer_dieskau" .

Shouldn't it be:

<https://w3id.org/polifonia/resource/Recording/recording_00509> a mp:Recording ;
    core:hasTitle <https://w3id.org/polifonia/resource/Title/gerald_moore> ;
    mp:hasRecordingPerformer <https://w3id.org/polifonia/resource/Agent/dietrich_fischer_dieskau> .

I tried this query and it returned [] :

SELECT 
    ?uri
    ?titleURI
    ?titleLabel
    ?artistURI
WHERE   {

    ?uri    a mp:Recording ;
            core:hasTitle ?titleURI .
            mp:hasRecordingPerformer ?artistURI .

    ?titleURI   a core:Title;
                rdfs:label ?titleLabel .

}

while this returns correct results:

SELECT 
    ?uri
    ?titleURI
    ?artistURI
WHERE   {

    ?uri    a mp:Recording ;
            core:hasTitle ?titleURI .
            mp:hasRecordingPerformer ?artistURI .
}

Hi @ccolonna, I am not sure, but given the time constraint and the fact that I can't work on this until Tuesday, please change the file as you see fit! Then later this week we can check in to see how to integrate the changes.

andreamust commented 2 years ago

@delfimpandiani I updated the original JSON file with the changes that you requested. You can find the file in the places folder Could you please check if everything's ok?

Looks great except for the place id, I need it to have underscores instead of dashes so, for example, instead of c56fdea4-e81e-439a-a183-a52eb1141409 it should be c56fdea4_e81e_439a_a183_a52eb114140

Just updated, now it should be ok!

valecarriero commented 2 years ago

to be removed

mp:mixingsession a mp:SessionType ; rdfs:label "mixed at"^^xsd:string . 
mp:recordingsession a mp:SessionType ; rdfs:label "recorded at"^^xsd:string .

rule for the address generation

To generate the code used in the URI of the address, use only what goes in core:fullAddress, instead of using also the Physical Site label (a physical site is linked to an address, so its label is not to be used for the address). The geometries have the same problem: I would use only what is in core:lat and core:long to generate the URIs.

wrong generation of URI: missing part

<https://w3id.org/polifonia/resource/Agent/> a core:Agent ;
    rdfs:label "Queen"^^xsd:string .
<https://w3id.org/polifonia/resource/Title/> a core:Title ;
    rdfs:label "We Will Rock You"^^xsd:string .

wrong generation of URI: different entities collapsed

As you can see from the endTime and startTime in the examples, generating this kind of URI makes different time intervals to have the same URI. I think that the values of endTime and of startTime should be used to generate the URI and solve this problem.

<https://w3id.org/polifonia/resource/TimeInterval/ti_00001_3> a core:TimeInterval ;
    core:endTime "1951-10-07",
        "1963-02-11",
        "1977-09" ;
    core:startTime "1951-10-03",
        "1963-02-11",
        "1977-08" .
delfimpandiani commented 2 years ago

@delfimpandiani I updated the original JSON file with the changes that you requested. You can find the file in the places folder Could you please check if everything's ok?

Looks great except for the place id, I need it to have underscores instead of dashes so, for example, instead of c56fdea4-e81e-439a-a183-a52eb1141409 it should be c56fdea4_e81e_439a_a183_a52eb114140

Just updated, now it should be ok!

Thank you @andreamust

delfimpandiani commented 2 years ago

Hi @valecarriero, thank you for the notes!

So far, I have dealt with this:

to be removed

mp:mixingsession a mp:SessionType ; rdfs:label "mixed at"^^xsd:string . 
mp:recordingsession a mp:SessionType ; rdfs:label "recorded at"^^xsd:string .

The other points not yet. I have not done it because I do not have enough time at least until Tuesday, as it is not easy to deal with these issues in RML.

Anyways, as we agreed that the OD Task Force would get trained on and switch the data transformation pipeline from RML mapping to SPARQL Anything, we should be able to deal directly with the data with SPARQLAnything starting from this upcoming week. And thus, we shouldn't have the rest of the issues you mentioned.

delfimpandiani commented 2 years ago

@ccolonna the Polifonia KG (in Turtle) has now been updated to include hundreds of examples (instead of just the three it had before). There are still issues, as you mentioned and as commented by Vale C. in a previous comment. Feel free to deal with any of them directly (using any method you want to edit the .ttl directly), if this is needed to feed the interface by tomorrow.

ccolonna commented 2 years ago

Thanks @delfimpandiani. I'm not sure I can modify the file automatically. In any case I can change a single resource and test transformation procedure. Then they will work the same when the error will be fixed. It's not urgent by now :)

I saw there are no more artists but I imagine there's a reason as well.

ccolonna commented 2 years ago

Hi everybody,

looking at this example and similar ones where there's more than an artist (artist_1, artist_2, ...):

Looking at the great ontology and @delfimpandiani really helpful mapping rules I understand that we should land to data somewhat shaped like this:


:some_recording mp:hasRecordingPerformer :artist_1 ,
                                         :artist_2 .

:artist_1 core:hasBirthPlace :country_uri ;
              core:hasCareer :artist_1_career_uri .

:country_uri core:hasAddress :country_address_uri .

:country_address_uri rdfs:label/core:hasCountryCode? :artist_country .

:artist_1_career_uri core:hasBeginPlace :place_uri .
:place_uri rdfs:label :artist_start .

assuming artist_1, artist_2, artist_start, artist_country maps on corresponding properties in the json below, Is the above semantically correct ?

Another little point I'm a bit confused is the relation between artist_1, artist_2, artist_3 and artist_country, artist_start . Even if there are more artists there's always one artist_country/start for each recording, is it a case that these artists come from same country ?

        {
            "track_id": "00695",
            "artist_1": "Stan Getz",
            "artist_2": "Jo\u00e3o Gilberto",
            "artist_3": "Antonio Carlos Jobim",
            "artist_for_iri_1": "stan_getz",
            "artist_for_iri_2": "jo\u00e3o_gilberto",
            "artist_for_iri_3": "antonio_carlos_jobim",
            "title": "The Girl From Ipanema",
            "title_for_iri": "the_girl_from_ipanema",
            "recording_places": [
                {
                    "ref_id": "00695",
                    "session_id": "00695_1",
                    "type": "recorded at",
                    "type_id": "recorded at",
                    "session_type": "mp:RecordingSession",
                    "begin": "1963-03-18",
                    "end": "1963-03-19",
                    "ended": "true",
                    "place": {
                        "id": "5b2dde24_1bec_470b_9450_339e33216fe4",
                        "type": "Studio",
                        "name": "A&R Recording Studio",
                        "disambiguation": "Third studio, 322 West 48th Street",
                        "address": "322 West 48 Street, New York, NY 10036-1308",
                        "coordinates": {
                            "latitude": "40.761498",
                            "longitude": "-73.988169"
                        }
                    }
                }
            ],
            "artist_country": "BR",
            "artist_start": "Tijuca"
        }
delfimpandiani commented 2 years ago

Hi @ccolonna, I think the best people to answer your questions are:

  1. about the model and the accuracy of the triples you provide - @valecarriero
  2. about the semantics behind the artist start data - @andreamust
ccolonna commented 2 years ago

Ok,

many thanks @delfimpandiani !

ccolonna commented 2 years ago

I'm gonna close this as we have a system setup for KG production and a Sparql Endpoint.