iptc / sport-schema

The next generation of sports data, based on IPTC’s SportsML and semantic web principles
17 stars 1 forks source link

Think about how to work with Wikidata IDs #110

Open bquinn opened 2 years ago

bquinn commented 2 years ago

Can we use them instead of / as well as proprietary IDs for athletes, teams, sites, competitions?

bquinn commented 1 year ago

Elaborating on this: my thoughts are that many of our entities already have IDs on wikidata. eg rather than use example.org/A1234 for Chris Hoy, maybe we just use his wikidata entity ID http://www.wikidata.org/entity/Q310383 instead?

@pauliharman mentioned that in SNaP they created an Identifier object that links from any of their objects to identifiers in other systems

I (@bquinn) mentioned rdf:sameAs and @pauliharman mentioned that he had been steered away from that by @silveroliver and Paul Wilton in the past.

Another thought: Up to now we have avoided creating a parent "Thing"-type object that everything inherits from. But maybe we need to add one to handle this type of thing? Maybe we need a construct like that to handle media links as well..?

pauljkelly commented 1 year ago

@bquinn Please update with notes from June 6, 2023 group meeting.

freeballoon commented 1 year ago

I (@bquinn) mentioned rdf:sameAs and @pauliharman mentioned that he had been steered away from that by @silveroliver and Paul Wilton in the past.

BBC continue to decorate concepts with core:sameAs dbpedia relationships (though currently don't exercise it when publishing to the audience), hence https://www.bbc.co.uk/things/ed138786-46c0-430f-9493-214d3c02c429 is derived from this:

<http://www.bbc.co.uk/things/ed138786-46c0-430f-9493-214d3c02c429#id> a sport:Person,
        tagging:TagConcept ;
    sal:nationality <http://www.bbc.co.uk/things/5f9de3a3-ce03-485f-ae82-e80d1b4efaf8#id> ;
    sal:videCode "HAM"^^xsd:string ;
    core:dateOfBirth "1991-05-07"^^xsd:date ;
    core:disambiguationHint "Motor Racing Driver"@en-gb ;
    core:preferredLabel "Lewis Hamilton"@en-gb ;
    core:sameAs <http://dbpedia.org/resource/Lewis_Hamilton> ;
    core:shortLabel "Hamilton"@en-gb ;
    sport:discipline <http://www.bbc.co.uk/things/13c4b240-b966-410e-a23a-f06dfa0b444b#id> ;
    sport:hasCompetedFor <http://www.bbc.co.uk/things/d5e292af-83a4-4773-8361-385bb6920093#id> ;
    foaf:familyName "Hamilton"@en-gb ;
    foaf:firstName "Lewis"@en-gb .

Be interesting to learn about the motivations to steer away from core:sameAs.

pauliharman commented 1 year ago

Be interesting to learn about the motivations to steer away from core:sameAs.

My recollection - and bear in mind this is 10+ years ago now, with all of the limitations etc of tools at the time - was that using a lot of sameAs can result in inference collapse/collisions if some of the things you are sameAs-ing are not in fact identical concepts. The problem can be exacerbated if your data included 3rd party or other linked data sources you can't control, e.g. someone puts a clumsy sameAs into wikidata and now your search engine is broken.

it may have been an artefact of how OWLIM did forward chaining... I can't quite remember the context, but it was realted to how 'hard' it treated some of the low level OWL concepts like sameAs.

bquinn commented 1 year ago

@freeballoon note that @pauliharman was talking about the official rdf:sameAs (by which I think he meant owl:sameAs), whereas the BBC examples use core:sameAs. It is defined as

Indicates that something is the same as something else, but in a way that is slightly weaker than owl:sameAs. It's purpose is to connect separate identities of the same thing, whilst keeping separation between the original statements of each.

So maybe we need our own concept of "sameAs" in the same way that the BBC core ontology has created its own "sameAs". But maybe to contain confusion we should change the name - "similarTo"??

bquinn commented 1 year ago

@bquinn Please update with notes from June 6, 2023 group meeting.

We discussed IDs and the idea of whether we should use wikidata as the ID of our entities.

eg currently we have

<http://example.com/Athlete/p.98980>
        rdf:type    sport:Athlete ;
        rdfs:label  "Emiliano Martínez" .

This player already has a Wikidata ID https://www.wikidata.org/wiki/Q3275904 so should we instead say the following?

<https://www.wikidata.org/entity/Q3275904>
        rdf:type    sport:Athlete ;
        rdfs:label  "Emiliano Martínez" .

Points for:

Points against:

We concluded that it was probably best to NOT use the wikidata IDs as our subject IDs, but to map to them somehow. Which is where the conversation above about sameAs came from.

pauliharman commented 1 year ago

@freeballoon note that @pauliharman was talking about the official rdf:sameAs (by which I think he meant owl:sameAs), whereas the BBC examples use core:sameAs. It is defined as

Indicates that something is the same as something else, but in a way that is slightly weaker than owl:sameAs. It's purpose is to connect separate identities of the same thing, whilst keeping separation between the original statements of each.

So maybe we need our own concept of "sameAs" in the same way that the BBC core ontology has created its own "sameAs". But maybe to contain confusion we should change the name - "similarTo"??

Oops sorry. Maybe skos:closeMatch ?

bquinn commented 1 year ago

Oops sorry. Maybe skos:closeMatch ?

yes that might work! It's tricky because we really are referring to the same person in real life, but maybe not the same entity in terms of the triples defined... 🤔