schemaorg / schemaorg

Schema.org - schemas and supporting software
https://schema.org/
Apache License 2.0
5.38k stars 821 forks source link

Schema.org should have mappings to Wikidata terms where possible #280

Open danbri opened 9 years ago

danbri commented 9 years ago

From Lydia Pintscher in https://twitter.com/nightrose/status/558549091844886528

@danbri any issue to track progress on http://schema.org  mapping to Wikidata? 
Maybe even get people to help out?

Update 2016-01-26 - since the original post there have been some improvements at both Wikidata and Schema.org:

danbri commented 9 years ago

Notes from IRC,

lydiapintscher commented 9 years ago

Here is how mapping can be done on the Wikidata side for example: https://www.wikidata.org/wiki/Property:P31

The JSON dumps are the best dumps.

danbri commented 9 years ago
innovimax commented 9 years ago

+1

elf-pavlik commented 9 years ago

happy to help here a little! I had chance to meet few people from Wikidata crew during 31C3 and remember that serving turtle also needs some fixing... but it already uses schema.org quite a lot!

$ curl http://www.wikidata.org/entity/Q80 -iL -H "Accept: text/turtle"
danbri commented 9 years ago

I went looking for the code that generates this. For those without turtle, an excerpt from running

curl http://www.wikidata.org/entity/Q42 -iL -H "Accept: text/turtle"

(full response is at https://gist.github.com/danbri/66616096d42e595376f6 )

[update]Hmm actually you can get it all in the browser without using content negotiation, just via suffixes:

( edit! I have moved a big chunk of text to https://gist.github.com/danbri/181ff7763f479c397e10 - apologies to those who got accidental notifications due to the '@' symbol.)

This is great but also unfortunately "the easy part" in that these are fixed built-in properties that each Wikidata entry will always carry.

Looking around for relevant source code,

It would be interesting to see how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata, at @lydiapintscher mentioned re https://www.wikidata.org/wiki/Property:P31

ppKrauss commented 9 years ago

I agree, "Schema.org should have mappings to Wikidata terms where possible". How to vote? or how to colaborate and/or check work in progress? There are a link about work in this issue?

elf-pavlik commented 9 years ago

@danbri please remember to fence code snippets with three backticks which can also include clue for syntax highlighting

```ttl
  code goes here @bg @dr @mr
  @prefix data: http://www.wikidata.org/wiki/Special:EntityData/ .
  @prefix schema: http://schema.org/ .
  no mentions using @foo


also see code tab in Examples of github markdown https://guides.github.com/features/mastering-markdown/#examples
elf-pavlik commented 9 years ago

@ppKrauss I think people would appreciate more machine readable mappings using owl:equivalentProperty etc. e.g. https://github.com/schemaorg/schemaorg/blob/d370e33a97654746e696973c7966b84b501a59dc/data/schema.rdfa#L5706

IMO we could consider everything from subset of OWL used by RDFa Vocabulary Entailment http://www.w3.org/TR/rdfa-syntax/#s_vocab_expansion

ppKrauss commented 9 years ago

@elf-pavlik thanks (!), so the issue now is only to add something as <link property="owl:equivalentProperty" href="http://WikiDataURL"/> in each rdf:Property and each rdfs:Class ... is it?

New suggestion: we may colaborate with an online interface or (initially) by a spreadsheet (ex. Excel) at github, with the columns wikidataID and Property or wikidataID and Class.

lydiapintscher commented 9 years ago

Why not add it directly in Wikidata?

ppKrauss commented 9 years ago

@lydiapintscher , perhaps I am not understanding your point, sorry... The objetive in this issue is to map the Schema.org's definitions into the Wikidata.org's concept-definitions, not the inverse.

lydiapintscher commented 9 years ago

Both should happen, no? ;-)

ppKrauss commented 9 years ago

@lydiapintscher , I think it is a matter of scope. You can imagine Wikidata as an (external and closed) didictionary, like Webster, not like an open project like Wiipedia.

lydiapintscher commented 9 years ago

Wikidata is just as open as Wikipedia.

nemobis commented 9 years ago

Peter, 22/02/2015 18:39:

wikipedia.org concept definitions

Does such a thing exist?

elf-pavlik commented 9 years ago

@lydiapintscher once schema.org URIs have mappings to wikidata URIs added, do you see a way to add them to wikidata in programmable way? IMO it doesn't make sense to do it manualy via web UI... maybe wikidata team could just import them from schema.rdfa?

BTW I'll stay most of march ~Berlin and could meet IRL with you and anyone else from wikidata interested in this issue... Whenever in Berlin I go anyways to #OKLab / CodeForBerlin on every monday evening at Wikimedia HQ :smile: (we can discuss details over pm - just see my gh profile)

ppKrauss commented 9 years ago

I am trying (with bad English) to consolidate this issue in a draft of the proposal, can you help?

A next step will be to create a Readme.md for everybody edit this text, perhaps with the #352 mechanism, and (phase1) implement "by hand" some examples in schema.rdfa.


Foundations collected from comments posted in this discussion:

  1. @danbri and Lydia Pintscher summary, "schema.org mapping to Wikidata".
  2. Techinal suggestion to "schema.org property marked as equiv to another: schema:description ", @danbri.
  3. @danbri and @elf-pavlik looking for some automation ... or "how addEntityMetaData might be amended to exploit equivalentProperty information in Wikidata".
  4. ...
  5. @elf-pavlik suggestion to add the tag <link property="owl:equivalentProperty" href="http://WikiDataURL"/>, into each rdfs:Class and each rdf:Property resource definitions.
    The equivalentProperty is the same as showed in the Property:P31 example) of @lydiapintscher.
  6. Proposal of @ppkrauss to start at Schema.org and with human work, with no automation (for test and start).
  7. Suggestion of @lydiapintscher for think also about Wikidata mapping to Schema.org...

PROPOSAL OF THE ISSUE #280

Proposal for enhance schema.rdfa definition descriptors (rdfs:comment) and semantics, mapping each vocabulary item to a Wikidata item.

A sibling project at Wikidata will be the Wikidata.org-to-Schema.org mapping.

PART 1 - SchemaOrg mapping to Wikidata

Actions: add <link property="{$OWL}" href="{$WikiDataURL}"/> with the correct $WikiDataURL.

Actions on testing phase: do some with no automation. Example: start with classes Person and Organization, and its properties.

Examples


PART 2 - Wikidata mapping to SchemaOrg

... under construction... see similar mappings at schema.rdfs.org/mappings.html... Wikidata also have a lot of iniciatives maping Wikidata to external vocabularies (ex. there are a map from Wikidata to BNCF Thesaurus)...

ppKrauss commented 9 years ago

@lydiapintscher , Sorry again... I not saw that there are also a proposal of "sibling project at Wikidata" (!)... Can you please check if my "draft of this proposal" text is now on the rails? I am trying to "translate" and consolidate all comments in one document... To start all with the same scope, objective, etc.

ppKrauss commented 9 years ago

@danbri , @elf-pavlik , and others, I not understand if there are a "formal procedure for create proposals" here...

Can you please check if my "draft of this proposal" text is now on the rails? I need your help to "translate" and consolidate it.


About automation, I still do not understand well, you want to automate? My opinion. I think we can start with non-automated procedures, that will be util to check automated ones, which happen to be introduced later... Or to check the "size" of the non-automated task (~1000 items!). I think that a reliable mapping needs human control.

elf-pavlik commented 9 years ago

@ppKrauss thanks for trying to summarize this thread into a proposal!

http://schema.org/Organization is owl:equivalentProperty to Q43229

please don't confuse owl:equivalentClass with owl:equivalentProperty

if you look at schema.rdf we need accordingly

for the automation, once we map one way schema.org -> wikidata (however we manage to do it) then we can automate importing most of that mapping into wikidata so no one needs to click and copy&paste...

Last but not least, schema.org just starts using github recently and also seems to go through various other processes, I would encourage you to stay patient and give people time to reply :smile:

danbri commented 9 years ago

Thanks all. Indeed I'm on a trip and can't currently give this the attention it deserves, but I would try to nudge the focus towards actual mappings and away from the specific implementation details at schema.org. We will be making some changes in the site tooling to support mechanisms for extension that may be relevant here.

How about we just jump into the details and start a spreadsheet with a table of schema.org types and properties? Eg on google docs...?

On Mon, 23 Feb 2015 09:06 ☮ elf Pavlik ☮ notifications@github.com wrote:

@ppKrauss https://github.com/ppKrauss thanks for trying to summarize this thread into a proposal!

http://schema.org/Organization is owl:equivalentProperty to Q43229

please don't confuse owl:equivalentClass with owl:equivalentProperty

if you look at schema.rdf https://github.com/schemaorg/schemaorg/blob/sdo-gozer/data/schema.rdfa we need accordingly

  • typeof="rdfs:Class" needs owl:equivalentClass or rdfs:subClassOf
  • typeof="rdf:Property" needs owl:equivalentProperty or rdfs:subPropertyOf

for the automation, once we map one way schema.org -> wikidata (however we manage to do it) then we can automate importing most of that mapping into wikidata so no one needs to click and copy&paste...

Last but not least, schema.org just starts using github recently and also seems to go through various other processes, I would encourage you to stay patient and give people time to reply [image: :smile:]

— Reply to this email directly or view it on GitHub https://github.com/schemaorg/schemaorg/issues/280#issuecomment-75584818.

ppKrauss commented 9 years ago

@elf-pavlik thanks (!), I edited with your correction (and now coping also to my issue280 "ahead of work" :-)


@danbri Ok I send to to this googleDoc and updated my #352 with the tool that generates the spreadsheet.


@elf-pavlik and @danbri , no urgence (!). As a novice here, I am experimenting/testing the collaboration possibilities, and studing schemaOrg as a project ... Now I have a better "schema.org big picture", I see a good work(!), by moderators and vibrant community. My only help/clue about "better Github use" is at #352, and perhaps still a little messy.

Returning to talk about the spreadsheet, there are ~1500 items (!)... A good starting point is the classes Person and Organization, the "vCard semantic" is the more used in the Web,

http://webdatacommons.org/structureddata/index.html#toc2

so, I am starting to work with them (Person and Organization)... It is ok, good starting point?

danbri commented 9 years ago

Thanks. Yes starting with the more most general / common types makes sense.

Where I got stuck: I could not figure out a good programmatic way to access Wikidata's schema information in all its richness.

Maybe there is a way to take the JSON dumps, load them into some fast-access NoSQL-ish database, so that things can be searched/matched/retrieved easily?

nearby: https://gist.github.com/chrpr/23926c4650ce4363c51b dumps DBpedia's vocab (not Wikidata, but worth a look for comparison)

jimkont commented 9 years ago

Wikidata provides RDF dumps here: http://tools.wmflabs.org/wikidata-exports/rdf/exports/20150126/

It is easy to get the classes from the wikidata-taxonomy dump but needs to be joined with the wikidata-terms dump to get the labels. For properties you can use the wikidata-properties dump

If you want something more fine-grained you can try the WKDT toolkit https://github.com/Wikidata/Wikidata-Toolkit

Or create a DBpedia extractor, we have experimental support for wikidata in this branch: https://github.com/alismayilov/extraction-framework/tree/wikidataAllCommits

RDF dumps can be directly loaded in a SPARQL endpoint or easily manipulated in CLI/code and load in any store.

ppKrauss commented 9 years ago

OK, phase1 completed! In this phase we can only to use "by hand" procedures... My basic work was,

... for more details (while the corresponding fork is pending) here.


I finished my first test with report/edit/rewrite "by hand" process... And, some new (minor) problems were evidenced, a kind of normalization demand:

  1. HTML-soruce-code normalization problems: reported as #360 and #359.
  2. "<link> vs <span><a>", seems also a normalization problem. My suggestion is to show transparently all the links to the crown, so, format link with the span template.

About item 2, countings:

Question (perhaps for @elf-pavlik, but no urgence!): can I adopt the span templates instead simple link tag? An convert all the residual <link ..../> also to span?

ppKrauss commented 9 years ago

Starting phase2: let's discuss and check the automation possibilities!

( while anybody can enhance the volume of Wikidata links at the GoogleDoc with spreadsheet of the phase1).


The first step here is to discuss about reality, that is summarized by the "schema_org_rdfa profile" (see #361).

gozer release profile (countings):


AUTOMATION OPPORTUNITIES:

  1. Propagating as semantic subset: it is valid for specific items, as to say "addressLocality is a semantic subset of PostalAddress", when we can propagate the WikidataID (ex. as rdfs:subPropertyOf); but not for broader items as Thing. There are 663 (!), so, we can expect some automation here... The first step is to indicate (we can add a column in the spreadsheet) who items are "broader" (so can not be used as semantic super-classes for WikidataID).

    1.1. inheriting semantic: all Property inherits the semantic of its parent Class, so, it is also a kind of "semantic subset" (and gain need to excluse the "broader cases")... There are another indirect situations in the graph? We must excluse all elected cases to excluse (later) from the spreadsheet.

  2. Geting WikidataID from external-equivalent item: I not see many, there are only ~70 links relating semantic definition in external vocabularies, see nLinks and countings with 'owl:equivalentClass', 'rdfs:subPropertyOf' and 'owl:equivalentProperty'. Perhaps 'dc:source', but it adds only more 24.
boanuge commented 9 years ago

The mapping from Schema.org types to Wikidata conceptual items seems very interesting. How is it going? I can see there has been no comment for a while. If applicable, I would like to join for the effort of mapping between these two. :)

ps. I found it hard to get the meaning of Wikidata class concepts (schema-level, not instance-level) as they use Qxxxxxx (not intuitive) terms for their conceptual items. Is there any tip to figure out what Qxxxxxxes mean in usual words?

ppKrauss commented 9 years ago

Hello @boanuge, well come to this iniciative! It is not abandoned... Do you try to collaborate here, with the GoogleDoc with spreadsheet of the phase1?

Perhaps we (you and I at this moment) need to show "more and good results" to restart this proposal... So, you can also help here in an extesion of the spreadsheet... Them, later, when we have "critical mass" of results, we will return here.

About your PS: no, Qxxxxxx is a Wikidata's project decision, an opaque identifier have some advantages.

JanZerebecki commented 9 years ago

For a human the label and description should define the meaning in words so far as it needs to be disambiguated from other concepts. Just look the Qxxxxxx up on wikidata.org or use its API to get the label and description in your favorite language.

boanuge commented 9 years ago

@ppKrauss Thank you very much. I will see what I can do. :) @JanZerebecki Thank you for the comment. I hoped there is a nice one page view for each Qxxxxxx term with label and description, such as schema.rdfa, instead of looking up one by one (there are too many Qxxxxxx to go through. :) Any comments about how Wikidata generates their items are appreciated.

JanZerebecki commented 9 years ago

There are too many items (item = Qxxxxxx) to list them all (currently more than 13 million). These are edited manually and in automated ways, see https://www.wikidata.org/wiki/Wikidata:Introduction for more information. Note that there are also properties. There is a list of all properties: https://www.wikidata.org/wiki/Wikidata:List_of_properties/all .

Example: https://www.wikidata.org/wiki/Q25169#P50 tells us: "The Hitchhiker's Guide to the Galaxy" (item Q25169) its author (property P50) is Douglas Adams (item Q42). https://www.wikidata.org/wiki/Property:P50#P1629 tells us that the property author (P50) is for the subject (P1629) item author (Q482980).

Maybe it is more useful to map to Wikidata properties instead of Wikidata items. https://schema.org/author would map to https://www.wikidata.org/wiki/Property:P50 .

Note that people already use Wikidata.org itself to do this mapping, like is done on https://www.wikidata.org/wiki/Property:P18#P1628 which says the Wikidata property image (P18) is equivalent to http://schema.org/image . These could be exported and added to schema.org which would ensure that the mapping is actually symmetric.

danbri commented 9 years ago

Yes the idea is purely to map the descriptive vocabulary (hundreds or low thousands of mainly types/properties), not millions of items.

thadguidry commented 9 years ago

@danbri then update the issue title.. instead of wikidata terms ... wikidata properties

ppKrauss commented 9 years ago

@JanZerebecki, as stated by Dan, "the idea is purely to map the descriptive vocabulary", is a SchemaOrg-to-Wikidata map, and SchemaOrd have max. ~1500 items, see countings above...
The main objective is to complement the poor/imprecise descriptions (rdfs:comment) of SchemaOrg.

(also @thadguidry) About properties like P50, in my opinion, they are like "internal database descriptors" of Wikidata, while Qxxxxxx are the entries for Wikipedia concepts. So the "author" concept is not P50, it is Q482980... The Qxxxxxx concepts are more stable and complete.

PS: the properties can generate cyclic references for SchemaOrg.

westurner commented 9 years ago

https://en.wikipedia.org/wiki/Ontology_alignment

So there are entity (class, property) resolutions (and disambiguation trees (w/ information gain))?

twamarc commented 9 years ago

Mapping to wiki is a plus -Among others. The health extension is now experimenting mapping concepts to defined concepts in healthcare standards and terminologies like SNOMED CT but also to RxNorm, LOINC, and ICD.

westurner commented 9 years ago

On Sep 3, 2015 1:38 PM, "Marc" notifications@github.com wrote:

Mapping to wiki is a plus -Among others. The health extension is now experimenting mapping concepts to defined concepts in healthcare standards and terminologies like SNOMED CT but also to RxNorm, LOINC, and ICD.

http://schema.org/code

http://schema.org/MedicalCode

— Reply to this email directly or view it on GitHub.

danbri commented 8 years ago

Quick update to make sure everyone is aware that Wikidata has a SPARQL endpoint now; linked from https://www.wikidata.org/wiki/Wikidata:Data_access#SPARQL_endpoints

danbri commented 8 years ago

Rather related: http://addshore.com/2015/12/wikidata-references-from-microdata/ from @addstore

danbri commented 8 years ago

I've been looking into how wikidata could look like as an external schema.org extension. Perhaps something like this (don't worry about the big header, eventually it would be hidden behind a simple URL). It be good if the corresponding triples were as close as possible to those in the Wikidata SPARQL endpoint.

<script type="application/ld+json">
{
"@context": {
  "@vocab": "http://schema.org/",
   "wd_lnbIdentifier": {"@id": "https://www.wikidata.org/entity/P1368" },
   "wd_countryOfCitizenship": {"@id": "https://www.wikidata.org/entity/P27" , "@type": "@id"},
   "wd_religion": {"@id": "https://www.wikidata.org/entity/P140", "@type": "@id"},
   "wd_nativeLanguage": {"@id": "https://www.wikidata.org/entity/P103", "@type": "@id"}
 },
  "@type": "Person",
  "@id": "https://www.wikidata.org/entity/Q42",
  "name": "Douglas Adams",
  "wd_lnbIdentifier": "000057405",
  "wd_countryOfCitizenship":
    {
      "@type": "Country",
      "@id": "https://www.wikidata.org/entity/Q145",
      "name": "United Kingdom"
    },
  "wd_religion": {
    "@id": "https://www.wikidata.org/entity/Q7066",
    "name": "atheism"
  },
  "wd_nativeLanguage": {
     "@type": "Language",
     "@id": "https://www.wikidata.org/entity/Q7979",
     "name": "British English"
  }
 }

</script>
danbri commented 8 years ago

@vrandezo and I have been exploring this some more.

For now, just a SPARQL query to try at query.wikidata.org

PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?property ?ptype ?label ?extsuper ?extsub ?extequiv
WHERE {?property a wikibase:Property; rdfs:label ?label; wikibase:propertyType ?ptype .
OPTIONAL { ?property wdt:P2235 ?extsuper . }
OPTIONAL { ?property wdt:P2236 ?extsub . }
OPTIONAL { ?property wdt:P1628 ?extequiv . }
FILTER( REGEX(STR(?extequiv), "schema.org") ||
  REGEX(STR(?extsub), "schema.org") ||
  REGEX(STR(?extsuper), "schema.org") )
FILTER(LANG(?label) = "en")}

... this shows that Wikidata itself can be used as a registry of mappings to/from schema.org terms :)

Dataliberate commented 8 years ago

The approach of using Wikidata to hold these mappings looks worth exploring further.

danbri commented 8 years ago

Here is another one (thanks to Denny):

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

# which properties are most commonly found on things that are 
# 'instance of' (P31) the 'Cat' type (Q146)?
SELECT ?prop (count(?prop) as ?count) WHERE {
  ?i  wdt:P31 wd:Q146 .
  ?i ?prop ?val .
  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

# TODO: 
# - figure out how to get the rdfs:label of these 
# - figure out how to handle v common types like human (Q5), can we sample e.g. 1000 items only?

... any help with the last parts gratefully received :)

danbri commented 8 years ago

Here is the complementary query, which finds most common properties whose value is something 'instance of' 'Cat'.

The query is written more compactly here, and has the same issues/problems as noted above:

SELECT ?prop (count(?prop) as ?count) WHERE {

  # some thing with some property that is some item, where that ...
  ?x ?prop ?i . 

  # item instanceOf Cat.
  ?i  <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> . 

  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

This corresponds loosly to the notion of properties whose http://schema.org/rangeIncludes is the type "Cat":

To compare, here are top results for the earlier query, i.e. properties whose domainIncludes the type "Cat". In other words, properties commonly found on items that are cats. Here is the earlier query in more compact form:

SELECT ?prop (count(?prop) as ?count) WHERE {

  ?i ?prop ?x . # some item has some property whose value x is an item, where that ...

  # item instanceOf Cat.
  ?i  <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> . 

  FILTER(STRSTARTS(STR(?prop), "http://www.wikidata.org/prop/direct/"))
} group by ?prop order by desc(?count)

Currently this gives 45 results, the most common properties (from 68 cats in wikidata) being:

Both Wikidata and schema.org vocabularies have a relatively loose, flexible and evolving association between types and properties; Wikidata even more so. While schema.org lists a current set of incoming and outgoing properties on each type, often adjusting these over time, Wikidata does not formally do this at all. There are currently some non-machine-readable notes on the relevant talk pages but nothing exposed via RDF/SPARQL. Consequently we need to mine this information from actual descriptions (such as the 68 cat descriptions in Wikidata) to get a sense of the emergent structure. This process also gives a feel for the "long tail" of property definitions that exists in Wikidata and which we can now re-use within schema.org descriptions across the Web.

danbri commented 8 years ago

We can use this to explore the data. For example, we see that one of the most common ways in which Wikidata references the Cat type is using property P161, 'cast member'. Who are are these famous acting cats?

SELECT * WHERE {
 ?x <http://www.wikidata.org/prop/direct/P161> ?i ; 
    # ?x with a 'cast member' that is some thing ?i...
    <http://www.w3.org/2000/01/rdf-schema#label> ?label .                                       

    # where ?i is 'instance of' the 'Cat' type:
 ?i <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146>;
    <http://www.w3.org/2000/01/rdf-schema#label> ?catname ;
    FILTER(LANG(?label) = "en")
    FILTER(LANG(?catname) = "en")
}

From this we learn, amongst other things, of a famous cat actor, Orangey (http://www.wikidata.org/entity/Q677525) that starred in several works including versions of Breakfast at Tiffany's, The Diary of Anne Frank, Village of the Giants. The creature has an IMDB page, if you are curious: http://www.imdb.com/name/nm1248838/ . If you scan that page for ]embedded schema.org](https://developers.google.com/structured-data/testing-tool/?url=http://www.imdb.com/name/nm1248838/) you can find out more about Orangey expressed as schema.org, including an image, a jobTitle of "Actor", and a description ("Orangey the Cat is the only feline double-winner of the Patsy Award, the animal kingdom's equivalent of the Oscar. "...).

For completeness, let's look at outgoing properties of Cat too. Let's see well known cat ownership relationships. Try this in http://query.wikidata.org:

SELECT * WHERE {
  # ?c 'owned by' ?o, where ?c is a Cat:
  ?c <http://www.wikidata.org/prop/direct/P127> ?o .
  ?c <http://www.wikidata.org/prop/direct/P31> <http://www.wikidata.org/entity/Q146> .

  ?c <http://www.w3.org/2000/01/rdf-schema#label> ?catname .
  ?o <http://www.w3.org/2000/01/rdf-schema#label> ?ownername.
  FILTER(LANG(?ownername) = "en")
  FILTER(LANG(?catname) = "en")
}

... you'll find Socks and Bill Clinton; India, owned by George W. and Laura Bush; Humphrey owned by the Cabinet Office etc.

Having got this far, there are a few things yet to investigate:

gkellogg commented 8 years ago

@danbri, interesting direction to go:

  1. Figure out an RDFS schema for WikiData based on these queries.
  2. Given that, a tool such as the [Ruby JSON-LD Context Generator][https://github.com/ruby-rdf/json-ld/blob/develop/script/gen_context] can be used to construct a JSON-LD context based on the range of properties described within that schema (see, for example, the D3 compatible schema.org context constructed from a version of the schema.org RDFa definition, which also includes the full vocabulary definition in JSON-LD).

As you suggest, vocabulary range information can also be used to create a CSVW datatype definitions for mapping CSV tables to JSON or RDF with appropriate datatype fidelity.

Note that what's most useful for both CSVW and the JSON-LD context is the property ranges, but inferring a WikiData RDFS definition from SPARQL queries seems pretty useful.

ppKrauss commented 8 years ago

Thanks @danbri and @gkellogg , good 2016 restaring work and results!

I imagine that you are looking for a SPARQL algorithm, that can do automatic recognition of each Wikidata item equivalent to a SchemaOrg item... Well, we can start with some sample of consensual item pairs, to check and/or discuss the behaviour of the proposed algorithms. Examples:

  1. Q82799 and schema/name: equivalent property, is ok?
  2. Q211198 and schema/audience: equivalent property, is ok?
  3. (Q482980 or P50) and schema/author: what to use?
  4. Q43229 and schema/Organization: equivalent class, is ok?
  5. ... more handmade examples here ...

Are these correspondences (1-4) consensual? Each is really an semantic equivalence relationship? How the algorithm will obtain these pairs?

Perhaps we can adapt this basic map algorithm as first tool for query.wikidata.


PS: Wikidata "could look at an external schema.org extension" as showed here and here, but we can also conclude that Wikidata is a good replacement for SchemaOrg :-) I will stay using Schema.

nemobis commented 8 years ago

Peter, 24/01/2016 15:42:

Are these correspondences (1-4) consensual?

To find out, add a https://www.wikidata.org/wiki/Property:P1709 statement on the Wikidata property/entity and see what happens!

ppKrauss commented 8 years ago

Hi @nemobis, thanks, can you express your ideia in a query at Wikidata? A query that demonstrate to us that an external concept (ex. SchemaOrg's author) is equivalent to a Wikidata item? (Q482980 in this ex.)... The problem is how to define each external concept in a generic SPARQL query.

About equivalence operator have two meanings here, in this issue (#280),

About my use of the term "consensual", is not about the equivalence operator, is about our "human understanding" and a community agreement (consensus) about understanding of each one (me, you and any other here discussing). Do you agree with my concept matching, at 1-4 listed pairs?


PS: as soon as there are more Wikidata items in the handmade sample set, the greater the difficulty of check or reaching consensus... So, we need to start with good consensus before to close the sample set. An homologated sample set is fundamental to test and discuss any kind of algorithm here.


NOTE about SOARQL algorithm aims

Oops, is important to distinguish some types of algorithms (approaches)...