tibonto / aeon

The Academic Event Ontology (AEON) can be used to represent information regarding academic events.
https://tibonto.github.io/aeon/
Creative Commons Attribution 4.0 International
14 stars 4 forks source link

Add mapping to external sources in annotation Property #74

Closed StroemPhi closed 3 years ago

StroemPhi commented 3 years ago

The information of which property maps to which property from an external plattform needs to be stored in an annotation property in AEON using a JSON syntax.

Example where aeon:AEON_0000026 (rdfs:label "maps to") holds the mapping info as JSON dict:

###  https://github.com/tibonto/aeon#part_of_series
aeon:part_of_series rdf:type owl:ObjectProperty ;
                    rdfs:subPropertyOf aeon:part_of ;
                    rdfs:domain aeon:AEON_0000001 ;
                    rdfs:range aeon:AEON_0000002 ;
                    aeon:AEON_0000026 "{\"wikidata\": {\"uri\": \"https://www.wikidata.org/wiki/Property:P179\", \"label\": \"part_of_the_seriesLabel\"}, \"openresearch\": {\"uri\": \"https://www.openresearch.org/wiki/Property:Event_in_series\", \"label\": \"Event_in_series\"}}"^^xsd:string ;
                    aeon:SMW_datatype "Page" ;
                    aeon:SMW_import_info "[[Category:AEON]] [[Category:Imported vocabulary]]" .

This way we can later create a YML file for each external plattform with which we can import data of those plattforms to our semantic media wiki instance.

Example how to parse the ontology in order to retrieve the mapping info and storing it as YML

import yaml
import json
import sys
from pathlib import Path
from typing import Dict
import rdflib

# function to make YML out of Python dict
def dict2yaml(path: str, data: Dict):
    with open(path, 'w') as yaml_f:
        yaml.safe_dump(data=data, stream=yaml_f)

# open a graph
graph = rdflib.Graph()
# load AEON into the graph
graph.parse('https://raw.githubusercontent.com/tibonto/aeon/issue-74_add_mapping/aeon.ttl', format="ttl")  
# query for mapping dictionary in aeon:mapsTo
qres = graph.query(
    """
    PREFIX aeon: <https://github.com/tibonto/aeon#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>

    SELECT DISTINCT ?aeon_property ?maps_to ?aeon_property_domain ?rdfs_label

    WHERE {
            ?aeon_property aeon:AEON_0000026 ?maps_to.
            {?aeon_property rdf:type owl:ObjectProperty.} UNION             
            {?aeon_property rdf:type owl:DatatypeProperty.}
            OPTIONAL {?aeon_property rdfs:domain ?aeon_property_domain.}
            OPTIONAL {?aeon_property rdfs:label ?rdfs_label.}
           }

    """)

print("the graph has %s statements" % len(qres))

mapping_dict = {}
for printout in qres:
    printout_dict = printout.asdict()
    #print (printout_dict)

    # get the annotated aeon property
    aeon_property = str(printout_dict.get('aeon_property'))
    # cut off base URI
    aeon_property = aeon_property.replace("https://github.com/tibonto/aeon#","")

    # get the mappings from annotation property maps_to
    maps_to = json.loads(str(printout_dict.get('maps_to')))

    # get domain from annotation property or literal
    domain = str(printout_dict.get('aeon_property_domain'))

    mapping_dict[aeon_property] = {'maps_to': maps_to}
    mapping_dict[aeon_property]['domain'] = domain

    # code to parse all mapped plattforms here, this needs to be fleshed out!
    dict2yaml('confident_mapping.yml',mapping_dict)
StroemPhi commented 3 years ago

YML produced using the above code.


duration:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    gnd:
      label: Date of conference or event
      uri: https://d-nb.info/standards/elementset/gnd#dateOfConferenceOrEvent
    wikidata:
      label: duration
      uri: https://www.wikidata.org/wiki/Property:P2047
end_date:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    openresearch:
      label: End_date
      uri: https://www.openresearch.org/wiki/Property:End_date
    wikidata:
      label: end_time
      uri: https://www.wikidata.org/wiki/Property:P582
is_about:
  domain: http://purl.obolibrary.org/obo/BFO_0000015
  maps_to:
    gnd:
      label: Topic that is related to a corporate body, conference, person, family,
        subject heading or work.
      uri: https://d-nb.info/standards/elementset/gnd#topic
    openresearch:
      label: Field
      uri: https://www.openresearch.org/wiki/Property:Field
    wikidata:
      label: main_subjectLabel
      uri: https://www.wikidata.org/wiki/Property:P921
occurs_in_city:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    gnd:
      label: Place of conference or event
      uri: https://d-nb.info/standards/elementset/gnd#placeOfConferenceOrEvent
    openresearch:
      label: Has_location_city
      uri: https://www.openresearch.org/wiki/Property:Has_location_city
    wikidata:
      label: locationLabel
      uri: https://www.wikidata.org/wiki/Property:P276
occurs_in_country:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    openresearch:
      label: Has_location_country
      uri: https://www.openresearch.org/wiki/Property:Has_location_country
    wikidata:
      label: countryLabel
      uri: https://www.wikidata.org/wiki/Property:P17
occurs_in_state:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    openresearch:
      label: Has_location_state
      uri: https://www.openresearch.org/wiki/Property:Has_location_state
    wikidata:
      label: located_in_the_administrative_territorial_entityLabel
      uri: https://www.wikidata.org/wiki/Property:P131
part_of_series:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    openresearch:
      label: Event_in_series
      uri: https://www.openresearch.org/wiki/Property:Event_in_series
    wikidata:
      label: part_of_the_seriesLabel
      uri: https://www.wikidata.org/wiki/Property:P179
process_acronym:
  domain: http://purl.obolibrary.org/obo/BFO_0000015
  maps_to:
    gnd:
      label: Abbreviated name for the conference or event
      uri: https://d-nb.info/standards/elementset/gnd#abbreviatedNameForTheConferenceOrEvent
    openresearch:
      label: Acronym
      uri: https://www.openresearch.org/wiki/Property:Acronym
    wikidata:
      label: short_nameLabel
      uri: https://www.wikidata.org/wiki/Property:P1813
process_alternative_name:
  domain: http://purl.obolibrary.org/obo/BFO_0000015
  maps_to:
    gnd:
      label: Variant name for the conference or event
      uri: https://d-nb.info/standards/elementset/gnd#variantNameForTheConferenceOrEvent
process_name:
  domain: http://purl.obolibrary.org/obo/BFO_0000015
  maps_to:
    gnd:
      label: Preferred name for the conference or event
      uri: https://d-nb.info/standards/elementset/gnd#preferredNameForTheConferenceOrEvent
    openresearch:
      label: Title
      uri: https://www.openresearch.org/wiki/Property:Title
    wikidata:
      label: itemLabel
      uri: null
process_website:
  domain: http://purl.obolibrary.org/obo/BFO_0000015
  maps_to:
    openresearch:
      label: Homepage
      uri: https://www.openresearch.org/wiki/Property:Homepage
    wikidata:
      label: official_website
      uri: https://www.wikidata.org/wiki/Property:P856
start_date:
  domain: https://github.com/tibonto/aeon#AEON_0000001
  maps_to:
    openresearch:
      label: Start_date
      uri: https://www.openresearch.org/wiki/Property:Start_date
    wikidata:
      label: start_time
      uri: https://www.wikidata.org/wiki/Property:P580
StroemPhi commented 3 years ago

@andrecastro0o can you give me your feedback, if the keys of this YML are ok for you?

Although thinking about it, you can make the YML in your script anyway you like. So I rephrase my question and ask, if the keys in the JSON dict are sufficient in your eyes.

andrecastro0o commented 3 years ago

@StroemPhi the Wikidata property seem good. There is a awkward one, but which is correct, due to the fact that the itemLabel var is not extracted from a property's value but from the label of the wikidata Qnumber of the subject.

process_name:
....
    wikidata:
      label: itemLabel
      uri: null

Are the all the same which we had under the aeon:WikidataLabel and aeon:WikidataURI properties, right? I would like to test the resulting yaml in the wikidata import events scripts, but will only be able to do it after Friday. But it looks good to .

Ahh one more thing aeon:WikidataLabel & aeon:URI are still in the ttl. https://github.com/tibonto/aeon/blob/43cde57978a442bdd8659d87d16d6f833d758445/aeon.ttl#L1240 Perhaps a good way to check is to make a search for them and see if all those that have aeon:WikidataLabel & aeon:URI now have aeon:AEON_0000026. And if so delete the wikidata props

StroemPhi commented 3 years ago

@andrecastro0o wrt your first point. Yes this seemed awkward to me yesterday also (same with has_WDQID --> label: itemID), but I took it from aeon:WikidataLabeland aeon:WikidataURI, as you rightly assumed. I guess, we should discuss this in more detail later.

Wrt the second point: thanks for the hint, wanted to use this as well to check if, I've got everything while at the same time practice my SPARQL and rdflib skillz.

StroemPhi commented 3 years ago

@andrecastro0o I've added also the still missing other Wikidata mappings previously described with aeon:WikidataURI & aeon:WikidataLabel. I will purge the latter two annotation properties from aeon, once you give me the go after your testing.

StroemPhi commented 3 years ago

I'll do Crossref next. Hopefully tonight.

StroemPhi commented 3 years ago

@andrecastro0o wrt the Crossref mapping, I used the JSON keys in line with Svantje's script (https://github.com/TIBHannover/confIDent-dataScraping/blob/master/crossref.py).

The Crossref JSON key used in the mapping of aeon:process_alternative_name maps to the proceedings title returned by Crossref as the title provided in event['name'] is often too short, probably due to wrong automatic truncation. (see also 97b57c5).

andrecastro0o commented 3 years ago

Good work @StroemPhi !! :)