hbz / lobid-vocabs

General lobid vocabulary and controlled term lists.
https://skohub.io/hbz/lobid-vocabs/heads/master
6 stars 10 forks source link

Add concepts derived from wikidata to nwbib-spatial #85

Closed acka47 closed 4 years ago

acka47 commented 5 years ago

Similar to the process from https://github.com/hbz/nwbib/issues/397. Currently a skos:Concept in nwbib-spatial has the following information (example):

https://github.com/hbz/lobid-vocabs/blob/aab7725d7514dbe81c5394520f2a75afc9ec67ca/nwbib/nwbib-spatial.ttl#L95-L100

We will have to add a link to the wikidata entity the concept is derived from (using the property http://xmlns.com/foaf/0.1/focus).

Open questions:

acka47 commented 5 years ago

In a slide for the NWBib meeting I added an example on how it could/should look like after transformation to SKOS:

<http://purl.org/lobid/nwbib-spatial#n9Q7924>
  a skos:Concept ;
  skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
  skos:prefLabel "Regierungsbezirk Arnsberg"@de ;
  foaf:focus <http://www.wikidata.org/entity/Q7924> ;
  skos:broader <http://purl.org/lobid/nwbib-spatial#n9> ;
  skos:narrower <http://purl.org/lobid/nwbib-spatial#9Q1295> ;
  skos:notation "9Q7924" .

<http://purl.org/lobid/nwbib-spatial#9Q1295>
  a skos:Concept ;
  skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
  skos:prefLabel "Dortmund"@de ;
  foaf:focus <http://www.wikidata.org/entity/Q1295> ;
  skos:broader <http://purl.org/lobid/nwbib-spatial#n9Q7924> ;
  skos:narrower .... ;
  skos:notation "9Q7924" .

<http://purl.org/lobid/nwbib-spatial#n9>
  a skos:Concept ;
  skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
  skos:prefLabel "Regierungsbezirke, Kreise, Orte. Euregio"@de ;
  skos:narrower <http://purl.org/lobid/nwbib-spatial#n9Q7924;
  skos:notation "9" .
acka47 commented 5 years ago

I started playing with SPARQL CONSTRUCT to create the file. Had several problem running the query via curl. In the end I found out that you should not have tabs in the query and then it will run. Here is the result of the current query (example Q1295/Dortmund as in the example above):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix wd: <http://www.wikidata.org/entity/> .

wd:Q1295 a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel "Dortmund"@de ;
    foaf:focus wd:Q1295 ;
    skos:notation "9http://www.wikidata.org/entity/Q1295" ;
    skos:broader wd:Q7924 .

It already looks quite good. Here is the query for curl:

curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
  ?item a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel ?itemLabel ;
    foaf:focus ?item ;
    skos:notation ?notation ;
    skos:broader ?broader .
  }
WHERE {
 {
      { ?item wdt:P131* wd:Q1198 . }
   UNION
      { ?item p:P131 [ ps:P131 wd:Q1198 ] . }
      { ?item wdt:P131 ?broader . }
      { ?item p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
  UNION
      { ?item p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland 
  UNION
      { ?item p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
  UNION
      { ?item p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
  UNION
      { ?item p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
  UNION
      { ?item p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
  UNION
      { ?item p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
  UNION
      { ?item p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
  UNION
      { ?item p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
  UNION
       { ?item p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
  UNION
      { ?item p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
  UNION
      { ?item wdt:P31/wdt:P279* wd:Q3146899 . } # Diözese der katholischen Kirche
  UNION
     { ?item p:P361 [ps:P361 wd:Q1380992 ] . } # Teil der ev. Kirche im Rheinland
  UNION
     { ?item p:P361 [ ps:P361 wd:Q1381014 ] . } # Teil der ev. Kirche Westfalen
  UNION
     { ?item p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
   UNION
   { ?item wdt:P31/wdt:P279*  wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
 }
 FILTER (?item != wd:Q1787449 && ?item != wd:Q16500124 && ?item != wd:Q1465811 && ?item != wd:Q1787449
       && ?item != wd:Q16832627 && ?item != wd:Q1113210 && ?item != wd:Q19288281 && ?item != wd:Q1662807
        && ?item != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
 BIND(CONCAT("9", STR(?item)) AS ?notation)
 SERVICE wikibase:label {  bd:serviceParam wikibase:language "de" }
}'
acka47 commented 5 years ago

There are some that have a statement skos:broader wd:Q1198. Q1198 is NRW and all of these statements have to be changed to skos:broader <http://purl.org/lobid/nwbib-spatial#n9>.

acka47 commented 5 years ago

There are some that have a statement skos:broader wd:Q1198. Q1198 is NRW and all of these statements have to be changed to skos:broader <http://purl.org/lobid/nwbib-spatial#n9>.

Oups, this is not correct as there also clearical regions (Dekanate, Kirchenkreise etc.). We should just remove them from the query resulting in this one:

curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
  ?item a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel ?itemLabel ;
    foaf:focus ?item ;
    skos:notation ?notation ;
    skos:broader ?broader .
  }
WHERE {
 {
      { ?item wdt:P131* wd:Q1198 . }
   UNION
      { ?item p:P131 [ ps:P131 wd:Q1198 ] . }
      { ?item wdt:P131 ?broader . }
      { ?item p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
  UNION
      { ?item p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland 
  UNION
      { ?item p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
  UNION
      { ?item p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
  UNION
      { ?item p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
  UNION
      { ?item p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
  UNION
      { ?item p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
  UNION
      { ?item p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
  UNION
      { ?item p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
  UNION
       { ?item p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
  UNION
      { ?item p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
  UNION
      { ?item p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
  UNION
     { ?item p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
   UNION
   { ?item wdt:P31/wdt:P279*  wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
 }
 FILTER (?item != wd:Q1787449 && ?item != wd:Q16500124 && ?item != wd:Q1465811 && ?item != wd:Q1787449
       && ?item != wd:Q16832627 && ?item != wd:Q1113210 && ?item != wd:Q19288281 && ?item != wd:Q1662807
        && ?item != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
 BIND(CONCAT("9", STR(?item)) AS ?notation)
 SERVICE wikibase:label {  bd:serviceParam wikibase:language "de" }
}'
acka47 commented 5 years ago

Finally, I managed to create the whole SKOS via SPARQL:

$ curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
    ?lobidURI a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel ?wikidataURILabel ;
    foaf:focus ?wikidataURI ;
    skos:notation ?QID ;
    skos:broader ?broaderURI .
  }
WHERE {
 {
      { ?wikidataURI wdt:P131* wd:Q1198 . }
   UNION
      { ?wikidataURI p:P131 [ ps:P131 wd:Q1198 ] . }
      { ?wikidataURI p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland 
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
  UNION
       { ?wikidataURI p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
  UNION
      { ?wikidataURI p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
  UNION
     { ?wikidataURI p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
   UNION
   { ?wikidataURI wdt:P31/wdt:P279*  wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
  OPTIONAL  { ?wikidataURI wdt:P131 ?broader . }
 }
# FILTER (?wikidataURI in (wd:Q1295))
 FILTER (?wikidataURI != wd:Q1787449 && ?wikidataURI != wd:Q16500124 && ?wikidataURI != wd:Q1465811 && ?wikidataURI != wd:Q1787449
       && ?wikidataURI != wd:Q16832627 && ?wikidataURI != wd:Q1113210 && ?wikidataURI != wd:Q19288281 && ?wikidataURI != wd:Q1662807
        && ?wikidataURI != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
 BIND (STRAFTER (STR(?wikidataURI),"entity/") AS ?QID)
 BIND (STRAFTER (STR(?broader),"entity/") AS ?broaderQID)
 BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?QID)) AS ?lobidURI)
 BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?broaderQID)) AS ?broaderURI)
 SERVICE wikibase:label {  bd:serviceParam wikibase:language "de" }
}'

There is some post-processing to do anyway:

  1. For all occurences of skos:broader <http://purl.org/lobid/nwbib-spatial#Q1198> replace by skos:broader <http://purl.org/lobid/nwbib-spatial#n9>
  2. Remove all entities that we do not have any titles about in NWBib.
  3. There are several entries with two or more skos:broader entries. We will have to look into those and checl what to do.

@fsteeg, will you do 1.) and 2.) and then add the result to the SKOS file? I will then see what to do regarding 3.).

acka47 commented 5 years ago

The SPARQL query from https://github.com/hbz/lobid-vocabs/issues/85#issuecomment-460636217 is fine but the SPARQL endpoint does not seem to finish the construct for every entity. Take for example Q2362403, it only has one triple in the resulting Turtle: <http://purl.org/lobid/nwbib-spatial#Q2362403> skos:prefLabel "Wingeshausen"@de .

but when I do the same query with a filter on only this resource (FILTER (?wikidataURI in (wd:Q2362403))), it looks good:

<http://purl.org/lobid/nwbib-spatial#Q2362403> a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel "Wingeshausen"@de ;
    foaf:focus wd:Q2362403 ;
    skos:notation "Q2362403" ;
    skos:broader <http://purl.org/lobid/nwbib-spatial#Q10944> .

One solution might be to do this in steps or to use the LDF endpoint...

fsteeg commented 5 years ago

As discussed offline, I generated a SKOS file from the current data at https://nwbib.de/spatial: https://github.com/hbz/lobid-vocabs/commit/660c0949dee6ec900d3ac058f023c629920d907c

(Raw file at https://raw.githubusercontent.com/hbz/lobid-vocabs/660c0949dee6ec900d3ac058f023c629920d907c/nwbib/nwbib-spatial.ttl)

We have the number of hits in NWBib at that point, so if that makes sense, we can add them to the file.

acka47 commented 5 years ago

Looks good except for one thing: foaf:focus should link to the wikidata entity the concept is based on, e.g.:

nwbib-spatial:Q1677185
        a               skos:Concept ;
        skos:broader    nwbib-spatial:Q2758 ;
        skos:inScheme   <http://purl.org/lobid/nwbib-spatial> ;
        skos:notation   "Q1677185" ;
        skos:prefLabel  "Wickrath"@de ;
        foaf:focus      wd:Q1677185 .

We have the number of hits in NWBib at that point, so if that makes sense, we can add them to the file.

No, those numbers don't make sense in the skos file.

fsteeg commented 5 years ago

Fixed foaf:focus values and generated ConceptScheme data from the RDF model:

https://raw.githubusercontent.com/hbz/lobid-vocabs/297f28dd140e561697eb2c79382d952b263eed33/nwbib/nwbib-spatial.ttl

acka47 commented 5 years ago

I just noticed that the end date is also part of the prefLabel, e.g.:

nwbib-spatial:Q878752
        a               skos:Concept ;
        skos:broader    nwbib-spatial:Q7920 ;
        skos:inScheme   <http://purl.org/lobid/nwbib-spatial> ;
        skos:notation   "Q878752" ;
        skos:prefLabel  "Landkreis Münster (bis 1974)"@de ;
        foaf:focus      wd:Q878752 .

This is technically not correct. We will have to think about how to handle this. One option is to use skos:note or maybe even skos:scopeNote for the date.

acka47 commented 5 years ago

As discussed on the mailing list today we should take care of identfying "Stadtbezirke" via the label when generating the SKOS file. Will add this as a task to the original issue: "Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337 and "Stadtbezirk" is not already part of the label"

acka47 commented 5 years ago

We should directly resolve this together with #86 and #89.

fsteeg commented 5 years ago

Regenerated SKOS file for current spatial data:

https://raw.githubusercontent.com/hbz/lobid-vocabs/2ee36a63132d4a9cfb82c04f11eb2d92bd9e44ce/nwbib/nwbib-spatial.ttl

acka47 commented 5 years ago

The SPARQL query from https://github.com/hbz/lobid-vocabs/issues/85#issuecomment-460636217 is fine but the SPARQL endpoint does not seem to finish the construct for every entity. Take for example Q2362403, it only has one triple in the resulting Turtle: <http://purl.org/lobid/nwbib-spatial#Q2362403> skos:prefLabel "Wingeshausen"@de .

BTW, this also does not work with a simpler SPARQL query based on the NWBib-ID property, like so:

curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
    ?lobidURI a skos:Concept ;
    skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
    skos:prefLabel ?wikidataURILabel ;
    foaf:focus ?wikidataURI ;
    skos:notation ?QID ;
    skos:broader ?broaderURI .
  }
WHERE {
 {
  ?wikidataURI wdt:P6814 ?nwbibId.
  OPTIONAL  { ?wikidataURI wdt:P131 ?broader . }
 }
# FILTER (?wikidataURI in (wd:Q1295))
 FILTER (?wikidataURI != wd:Q1787449 && ?wikidataURI != wd:Q16500124 && ?wikidataURI != wd:Q1465811 && ?wikidataURI != wd:Q1787449
       && ?wikidataURI != wd:Q16832627 && ?wikidataURI != wd:Q1113210 && ?wikidataURI != wd:Q19288281 && ?wikidataURI != wd:Q1662807
        && ?wikidataURI != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
 BIND (STRAFTER (STR(?wikidataURI),"entity/") AS ?QID)
 BIND (STRAFTER (STR(?broader),"entity/") AS ?broaderQID)
 BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?QID)) AS ?lobidURI)
 BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?broaderQID)) AS ?broaderURI)
 SERVICE wikibase:label {  bd:serviceParam wikibase:language "de" }
}'

There is a Wikidata issue for this at https://phabricator.wikimedia.org/T211178 but I guess this will not be addressed soon as Wikidata has to solve some more urgent performance issues before.

acka47 commented 5 years ago

To Dos for @fsteeg:

fsteeg commented 5 years ago

New SKOS file generated with P6814 query, AGS or KS as notation, tweaked prefLabel:

https://raw.githubusercontent.com/hbz/lobid-vocabs/b030e12d7a9031990a7e2ce5bfee584ecc3b0796/nwbib/nwbib-spatial.ttl

fsteeg commented 5 years ago

Regenerated after fixing an issue: https://raw.githubusercontent.com/hbz/lobid-vocabs/43f85d1fc0fbd7052ef8af646eed5a9f53293b0c/nwbib/nwbib-spatial.ttl

But noticed a problem: many entries now have multiple broader values, coming from the Wikidata query and the non-90s-qids.json file. E.g. Kirchenkreis Aachen: 35 from https://github.com/hbz/nwbib/blob/f14873999b115475761d7041bacc93a460a5d439/conf/non-90s-qids.json#L190 and Q1198 from the Wikidata query.

acka47 commented 5 years ago

many entries now have multiple broader values, coming from the Wikidata query and the non-90s-qids.json file. E.g. Kirchenkreis Aachen: 35 from https://github.com/hbz/nwbib/blob/f14873999b115475761d7041bacc93a460a5d439/conf/non-90s-qids.json#L190 and Q1198 from the Wikidata query.

Yes, I mentioned this yesterday. The solution is to discard the P131 information from Wikidata if an entity is covered in non-90s-qids.json.

acka47 commented 5 years ago

Other wise the file looks fine except for one error in the foaf:focus statements. This is currently pointing to the resource itself and not to Wikidata. This means

nwbib-spatial:Q2103  a  skos:Concept ;
        skos:broader    nwbib-spatial:Q7924 ;
        skos:inScheme   <https://nwbib.de/spatial> ;
        skos:notation   "05911000" ;
        skos:prefLabel  "Bochum"@de ;
        foaf:focus      nwbib-spatial:Q2103 .

should become

nwbib-spatial:Q2103  a  skos:Concept ;
        skos:broader    nwbib-spatial:Q7924 ;
        skos:inScheme   <https://nwbib.de/spatial> ;
        skos:notation   "05911000" ;
        skos:prefLabel  "Bochum"@de ;
        foaf:focus      wd:Q2103 .

Another thing: The foaf:focus statements from the ttl vocab are missing, e.g. in:

https://github.com/hbz/lobid-vocabs/blob/7e54138dfd5d074944baa4c333aa5beacb2487d8/nwbib/nwbib-spatial.ttl#L54-L60

fsteeg commented 5 years ago

Latest version (no multiple broader, WD for focus): https://raw.githubusercontent.com/hbz/lobid-vocabs/c67454ce6f683e361e70fbd05bbede97f93e48e1/nwbib/nwbib-spatial.ttl

Remaining TODO in this issue: retain focus information from original SKOS file.

fsteeg commented 4 years ago

Latest version including original focus information: https://raw.githubusercontent.com/hbz/lobid-vocabs/ce6214c673e88af58f219330d0c0709ff843d1b6/nwbib/nwbib-spatial.ttl

acka47 commented 4 years ago

Looks good. I think we are done with this issue. +1

fsteeg commented 4 years ago

In the first comment here, there's a TODO:

Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337 and "Stadtbezirk" is not already part of the label

Is that (still) relevant?

acka47 commented 4 years ago

In the first comment here, there's a TODO:

Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337 and "Stadtbezirk" is not already part of the label

Is that (still) relevant?

I don't think so. When I remember correctly, adding this created some other problems (e.g. double mention of "Stadtbezirk" in some labels). We won't implement this and will pick it up if editors ask for it again.