Closed acka47 closed 4 years ago
In a slide for the NWBib meeting I added an example on how it could/should look like after transformation to SKOS:
<http://purl.org/lobid/nwbib-spatial#n9Q7924>
a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel "Regierungsbezirk Arnsberg"@de ;
foaf:focus <http://www.wikidata.org/entity/Q7924> ;
skos:broader <http://purl.org/lobid/nwbib-spatial#n9> ;
skos:narrower <http://purl.org/lobid/nwbib-spatial#9Q1295> ;
skos:notation "9Q7924" .
<http://purl.org/lobid/nwbib-spatial#9Q1295>
a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel "Dortmund"@de ;
foaf:focus <http://www.wikidata.org/entity/Q1295> ;
skos:broader <http://purl.org/lobid/nwbib-spatial#n9Q7924> ;
skos:narrower .... ;
skos:notation "9Q7924" .
<http://purl.org/lobid/nwbib-spatial#n9>
a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel "Regierungsbezirke, Kreise, Orte. Euregio"@de ;
skos:narrower <http://purl.org/lobid/nwbib-spatial#n9Q7924;
skos:notation "9" .
I started playing with SPARQL CONSTRUCT to create the file. Had several problem running the query via curl. In the end I found out that you should not have tabs in the query and then it will run. Here is the result of the current query (example Q1295/Dortmund as in the example above):
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix wd: <http://www.wikidata.org/entity/> .
wd:Q1295 a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel "Dortmund"@de ;
foaf:focus wd:Q1295 ;
skos:notation "9http://www.wikidata.org/entity/Q1295" ;
skos:broader wd:Q7924 .
It already looks quite good. Here is the query for curl:
curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
?item a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel ?itemLabel ;
foaf:focus ?item ;
skos:notation ?notation ;
skos:broader ?broader .
}
WHERE {
{
{ ?item wdt:P131* wd:Q1198 . }
UNION
{ ?item p:P131 [ ps:P131 wd:Q1198 ] . }
{ ?item wdt:P131 ?broader . }
{ ?item p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
UNION
{ ?item p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
UNION
{ ?item p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
UNION
{ ?item p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
UNION
{ ?item p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
UNION
{ ?item p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
UNION
{ ?item p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
UNION
{ ?item wdt:P31/wdt:P279* wd:Q3146899 . } # Diözese der katholischen Kirche
UNION
{ ?item p:P361 [ps:P361 wd:Q1380992 ] . } # Teil der ev. Kirche im Rheinland
UNION
{ ?item p:P361 [ ps:P361 wd:Q1381014 ] . } # Teil der ev. Kirche Westfalen
UNION
{ ?item p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
UNION
{ ?item wdt:P31/wdt:P279* wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
}
FILTER (?item != wd:Q1787449 && ?item != wd:Q16500124 && ?item != wd:Q1465811 && ?item != wd:Q1787449
&& ?item != wd:Q16832627 && ?item != wd:Q1113210 && ?item != wd:Q19288281 && ?item != wd:Q1662807
&& ?item != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
BIND(CONCAT("9", STR(?item)) AS ?notation)
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}'
There are some that have a statement skos:broader wd:Q1198
. Q1198 is NRW and all of these statements have to be changed to skos:broader <http://purl.org/lobid/nwbib-spatial#n9>
.
There are some that have a statement
skos:broader wd:Q1198
. Q1198 is NRW and all of these statements have to be changed toskos:broader <http://purl.org/lobid/nwbib-spatial#n9>
.
Oups, this is not correct as there also clearical regions (Dekanate, Kirchenkreise etc.). We should just remove them from the query resulting in this one:
curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
?item a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel ?itemLabel ;
foaf:focus ?item ;
skos:notation ?notation ;
skos:broader ?broader .
}
WHERE {
{
{ ?item wdt:P131* wd:Q1198 . }
UNION
{ ?item p:P131 [ ps:P131 wd:Q1198 ] . }
{ ?item wdt:P131 ?broader . }
{ ?item p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
UNION
{ ?item p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
UNION
{ ?item p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
UNION
{ ?item p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
UNION
{ ?item p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
UNION
{ ?item p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
UNION
{ ?item p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
UNION
{ ?item p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
UNION
{ ?item p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
UNION
{ ?item p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
UNION
{ ?item wdt:P31/wdt:P279* wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
}
FILTER (?item != wd:Q1787449 && ?item != wd:Q16500124 && ?item != wd:Q1465811 && ?item != wd:Q1787449
&& ?item != wd:Q16832627 && ?item != wd:Q1113210 && ?item != wd:Q19288281 && ?item != wd:Q1662807
&& ?item != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
BIND(CONCAT("9", STR(?item)) AS ?notation)
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}'
Finally, I managed to create the whole SKOS via SPARQL:
$ curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
?lobidURI a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel ?wikidataURILabel ;
foaf:focus ?wikidataURI ;
skos:notation ?QID ;
skos:broader ?broaderURI .
}
WHERE {
{
{ ?wikidataURI wdt:P131* wd:Q1198 . }
UNION
{ ?wikidataURI p:P131 [ ps:P131 wd:Q1198 ] . }
{ ?wikidataURI p:P31 [ ps:P31 wd:Q829277 ] . } # Regierungsbezirk in NRW
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q106658 ] . } # Landkreis in Deutschland
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q5283531 ] . } # Landkreis in Preußen
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q262166 ] . } # Gemeinde in Deutschland
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q22865 ] . } # kreisfreie Stadt in Deutschland
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q253019 ]. } # Ortsteil
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q2983893 ]. } # Stadtteil
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q42744322 ]. } # Stadtgemeinde Deutschlands
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q134626 ]. } # Kreisstadt
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q448801 ]. } # Große Kreisstadt
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q1548518 ]. } # Große kreisangehörige Stadt
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q54935786 ]. } # Mittlere kreisangehörige Stadt
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q1852178 ] . } # Stadteil von Düsseldorf
UNION
{ ?wikidataURI p:P31 [ ps:P31 wd:Q15632166 ] . } # Stadtteil von Köln
UNION
{ ?wikidataURI p:P31 [ps:P31 wd:Q1780389 ] . } # Kommunalverband der besonderen Art (derzeit nur "Städteregion Aachen")
UNION
{ ?wikidataURI wdt:P31/wdt:P279* wd:Q4286337 . } # Stadtbezirk, für Geocache auskommentieren
OPTIONAL { ?wikidataURI wdt:P131 ?broader . }
}
# FILTER (?wikidataURI in (wd:Q1295))
FILTER (?wikidataURI != wd:Q1787449 && ?wikidataURI != wd:Q16500124 && ?wikidataURI != wd:Q1465811 && ?wikidataURI != wd:Q1787449
&& ?wikidataURI != wd:Q16832627 && ?wikidataURI != wd:Q1113210 && ?wikidataURI != wd:Q19288281 && ?wikidataURI != wd:Q1662807
&& ?wikidataURI != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
BIND (STRAFTER (STR(?wikidataURI),"entity/") AS ?QID)
BIND (STRAFTER (STR(?broader),"entity/") AS ?broaderQID)
BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?QID)) AS ?lobidURI)
BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?broaderQID)) AS ?broaderURI)
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}'
There is some post-processing to do anyway:
skos:broader <http://purl.org/lobid/nwbib-spatial#Q1198>
replace by skos:broader <http://purl.org/lobid/nwbib-spatial#n9>
skos:broader
entries. We will have to look into those and checl what to do.@fsteeg, will you do 1.) and 2.) and then add the result to the SKOS file? I will then see what to do regarding 3.).
The SPARQL query from https://github.com/hbz/lobid-vocabs/issues/85#issuecomment-460636217 is fine but the SPARQL endpoint does not seem to finish the construct for every entity. Take for example Q2362403, it only has one triple in the resulting Turtle: <http://purl.org/lobid/nwbib-spatial#Q2362403> skos:prefLabel "Wingeshausen"@de .
but when I do the same query with a filter on only this resource (FILTER (?wikidataURI in (wd:Q2362403))
), it looks good:
<http://purl.org/lobid/nwbib-spatial#Q2362403> a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel "Wingeshausen"@de ;
foaf:focus wd:Q2362403 ;
skos:notation "Q2362403" ;
skos:broader <http://purl.org/lobid/nwbib-spatial#Q10944> .
One solution might be to do this in steps or to use the LDF endpoint...
As discussed offline, I generated a SKOS file from the current data at https://nwbib.de/spatial: https://github.com/hbz/lobid-vocabs/commit/660c0949dee6ec900d3ac058f023c629920d907c
(Raw file at https://raw.githubusercontent.com/hbz/lobid-vocabs/660c0949dee6ec900d3ac058f023c629920d907c/nwbib/nwbib-spatial.ttl)
We have the number of hits in NWBib at that point, so if that makes sense, we can add them to the file.
Looks good except for one thing: foaf:focus
should link to the wikidata entity the concept is based on, e.g.:
nwbib-spatial:Q1677185
a skos:Concept ;
skos:broader nwbib-spatial:Q2758 ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:notation "Q1677185" ;
skos:prefLabel "Wickrath"@de ;
foaf:focus wd:Q1677185 .
We have the number of hits in NWBib at that point, so if that makes sense, we can add them to the file.
No, those numbers don't make sense in the skos file.
Fixed foaf:focus values and generated ConceptScheme data from the RDF model:
I just noticed that the end date is also part of the prefLabel, e.g.:
nwbib-spatial:Q878752
a skos:Concept ;
skos:broader nwbib-spatial:Q7920 ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:notation "Q878752" ;
skos:prefLabel "Landkreis Münster (bis 1974)"@de ;
foaf:focus wd:Q878752 .
This is technically not correct. We will have to think about how to handle this. One option is to use skos:note
or maybe even skos:scopeNote
for the date.
As discussed on the mailing list today we should take care of identfying "Stadtbezirke" via the label when generating the SKOS file. Will add this as a task to the original issue: "Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337
and "Stadtbezirk" is not already part of the label"
We should directly resolve this together with #86 and #89.
Regenerated SKOS file for current spatial data:
The SPARQL query from https://github.com/hbz/lobid-vocabs/issues/85#issuecomment-460636217 is fine but the SPARQL endpoint does not seem to finish the construct for every entity. Take for example Q2362403, it only has one triple in the resulting Turtle:
<http://purl.org/lobid/nwbib-spatial#Q2362403> skos:prefLabel "Wingeshausen"@de .
BTW, this also does not work with a simpler SPARQL query based on the NWBib-ID property, like so:
curl -H "Accept: text/turtle" -G "https://query.wikidata.org/sparql" --data-urlencode query='
CONSTRUCT {
?lobidURI a skos:Concept ;
skos:inScheme <http://purl.org/lobid/nwbib-spatial> ;
skos:prefLabel ?wikidataURILabel ;
foaf:focus ?wikidataURI ;
skos:notation ?QID ;
skos:broader ?broaderURI .
}
WHERE {
{
?wikidataURI wdt:P6814 ?nwbibId.
OPTIONAL { ?wikidataURI wdt:P131 ?broader . }
}
# FILTER (?wikidataURI in (wd:Q1295))
FILTER (?wikidataURI != wd:Q1787449 && ?wikidataURI != wd:Q16500124 && ?wikidataURI != wd:Q1465811 && ?wikidataURI != wd:Q1787449
&& ?wikidataURI != wd:Q16832627 && ?wikidataURI != wd:Q1113210 && ?wikidataURI != wd:Q19288281 && ?wikidataURI != wd:Q1662807
&& ?wikidataURI != wd:Q1351319 ) # Herausfiltern von Altkreisen, die namensidentisch sind mit Neukreisen
BIND (STRAFTER (STR(?wikidataURI),"entity/") AS ?QID)
BIND (STRAFTER (STR(?broader),"entity/") AS ?broaderQID)
BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?QID)) AS ?lobidURI)
BIND (URI(CONCAT ("http://purl.org/lobid/nwbib-spatial#", ?broaderQID)) AS ?broaderURI)
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" }
}'
There is a Wikidata issue for this at https://phabricator.wikimedia.org/T211178 but I guess this will not be addressed soon as Wikidata has to solve some more urgent performance issues before.
To Dos for @fsteeg:
prefLabel
for NIDsNew SKOS file generated with P6814 query, AGS or KS as notation, tweaked prefLabel
:
Regenerated after fixing an issue: https://raw.githubusercontent.com/hbz/lobid-vocabs/43f85d1fc0fbd7052ef8af646eed5a9f53293b0c/nwbib/nwbib-spatial.ttl
But noticed a problem: many entries now have multiple broader
values, coming from the Wikidata query and the non-90s-qids.json
file. E.g. Kirchenkreis Aachen
: 35
from https://github.com/hbz/nwbib/blob/f14873999b115475761d7041bacc93a460a5d439/conf/non-90s-qids.json#L190 and Q1198
from the Wikidata query.
many entries now have multiple broader values, coming from the Wikidata query and the non-90s-qids.json file. E.g. Kirchenkreis Aachen: 35 from https://github.com/hbz/nwbib/blob/f14873999b115475761d7041bacc93a460a5d439/conf/non-90s-qids.json#L190 and Q1198 from the Wikidata query.
Yes, I mentioned this yesterday. The solution is to discard the P131
information from Wikidata if an entity is covered in non-90s-qids.json.
Other wise the file looks fine except for one error in the foaf:focus
statements. This is currently pointing to the resource itself and not to Wikidata. This means
nwbib-spatial:Q2103 a skos:Concept ;
skos:broader nwbib-spatial:Q7924 ;
skos:inScheme <https://nwbib.de/spatial> ;
skos:notation "05911000" ;
skos:prefLabel "Bochum"@de ;
foaf:focus nwbib-spatial:Q2103 .
should become
nwbib-spatial:Q2103 a skos:Concept ;
skos:broader nwbib-spatial:Q7924 ;
skos:inScheme <https://nwbib.de/spatial> ;
skos:notation "05911000" ;
skos:prefLabel "Bochum"@de ;
foaf:focus wd:Q2103 .
Another thing: The foaf:focus
statements from the ttl vocab are missing, e.g. in:
Latest version (no multiple broader
, WD for focus
): https://raw.githubusercontent.com/hbz/lobid-vocabs/c67454ce6f683e361e70fbd05bbede97f93e48e1/nwbib/nwbib-spatial.ttl
Remaining TODO in this issue: retain focus
information from original SKOS file.
Latest version including original focus
information: https://raw.githubusercontent.com/hbz/lobid-vocabs/ce6214c673e88af58f219330d0c0709ff843d1b6/nwbib/nwbib-spatial.ttl
Looks good. I think we are done with this issue. +1
In the first comment here, there's a TODO:
Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337 and "Stadtbezirk" is not already part of the label
Is that (still) relevant?
In the first comment here, there's a TODO:
Add suffix " (Stadtbezirk)" to the label when ?item wdt:P31/wdt:P279* wd:Q4286337 and "Stadtbezirk" is not already part of the label
Is that (still) relevant?
I don't think so. When I remember correctly, adding this created some other problems (e.g. double mention of "Stadtbezirk" in some labels). We won't implement this and will pick it up if editors ask for it again.
Similar to the process from https://github.com/hbz/nwbib/issues/397. Currently a
skos:Concept
in nwbib-spatial has the following information (example):https://github.com/hbz/lobid-vocabs/blob/aab7725d7514dbe81c5394520f2a75afc9ec67ca/nwbib/nwbib-spatial.ttl#L95-L100
We will have to add a link to the wikidata entity the concept is derived from (using the property
http://xmlns.com/foaf/0.1/focus
).Open questions:
What URIs to use for Wikidata-derived concepts? Options
http://www.wikidata.org/entity/Q884315
http://purl.org/lobid/nwbib-spatial#nQ884315
orhttp://purl.org/lobid/nwbib-spatial#Q884315
http://purl.org/lobid/nwbib-spatial#24Q884315
[ ] Add suffix " (Stadtbezirk)" to the label when
?item wdt:P31/wdt:P279* wd:Q4286337
and "Stadtbezirk" is not already part of the label