Closed acka47 closed 4 years ago
Looking at yesterday's efforts again, it seems that I actually was able to create a list of all Wikidata entries with P6814 (NWBib ID) that are not in the SKOS file, see https://gist.github.com/acka47/e6870164a0f7ef9d5772d0ee5f0e1827
Based on P6814
query: https://w.wiki/7c2
qid-p6814-missing-in-nwbib.csv.txt qid-p6814-missing-in-wiki.csv.txt
I added the NWbib Id to the one Wikidata entry where it was missing. It was one place (Lippischer Wald) we just added to the matching process after ingesting NWBib IDs into Wikidata.
Furthermore I look at the 125 Wikidata entries with NWBib ID that do not appear in the SKOS file. First I checked whether they actually exist in lobid-resources. I tested 30 resources, here is the result:
All Wikidata entries from 2.) received an NWBib ID by the quickstatements upload I did. So it does not look like as if we have a problem with anyone arbitrarily adding NWBib IDs to Wikidata. I assume that NWBib editors removed the corresponding entries from the catalog.
I assume that NWBib editors removed the corresponding entries from the catalog.
There are also some Stadtbezirke in there that need to be part of the classification but haver no hits. These are Q54803600 and Q54803599.
Another possibility is that our matching has problems. Looking at the examples, this seems the case for:
coverage:"Castrop | 99"
but they only have Q3898 (Castrop-Rauxel) in the spatial
array (Q151243 although is part of the geo index)coverage:Oberbilk
but they do not have any spatial
entry as Q8249663 currently is not part of the NWBib geo index. This will change with https://github.com/hbz/lobid-resources/commit/73b07a04ecbd7b4b0c522c3cd0937e0efc8dc163.coverage:"Wengern | 99"
but with no spatial
as Q2559367 is missing in the geo indexWe shoudl asap update the geo index and SKOS file and take another look afterwards
As discussed offline, most of the remaining missing items are caused by 0 hits in the catalog:
Q151243 Q1271768 ~Q1530668~ ~Q1607906~ Q11343 Q1821058 ~Q32054087~ Q1366743 Q882647 Q2559367 Q8249663
Two are part of a Gemarkung
, which itself is not part of NWBib:
Q1301770
Q1760203
Finally one is a former Kreis
:
Q1759911
Re. Q1271768, which is Gangelt-Hastenrath: There exists another Hastenrath (Q1588790) which is already matched from coverage:hastenrath. Looking at the coverage entries, this mostly makes sense:
$ curl http://lobid.org/resources/search?q=coverage%3AHastenrath | jq .member[].coverage
[
"Hastenrath | 99"
]
[
"Hastenrath <Eschweiler> | 99"
]
[
"Eschweiler-Hastenrath | 99"
]
[
"Süsterseel | 99",
"Hastenrath | 99"
]
[
"Eschweiler-Hastenrath | 99"
]
[
"Hastenrath, Eschweiler | 99"
]
[
"Eschweiler-Hastenrath | 99"
]
[
"Hastenrath, Eschweiler | 99"
]
[
"Hastenrath, Eschweiler | 99",
"Scherpenseel, Eschweiler | 99"
]
[
"Eschweiler-Scherpenseel | 99",
"Eschweiler-Hastenrath | 99"
]
[
"Scherpenseel, Eschweiler | 99",
"Hastenrath, Eschweiler | 99"
]
Both entries with coverage: Hastenrath | 99
refer to the other Hastenrath though. Thus, I added it to the manual matching list with https://github.com/hbz/lobid-resources/pull/1013/commits/bec0584debe59419d17b568855d973058ebadd5e.
Re. Q1530668 (Wingenbach): I removed the NWBib ID from Wikidata as there is another Wingenbach (Q2584381) that the one entry with the respective coverage is already successfully matched to: https://lobid.org/resources/HT016063885
Re. Q1607906 (Herbeck), there exists another Herbeck (Q55499627) that the entries with coverage:Herbeck
are corectly linked to. Thus, I removed the NWBib ID from the Wikidata entry.
Re. Q32054087: Also removed NWBib ID because https://www.wikidata.org/wiki/Q4082
The other places from https://github.com/hbz/nwbib/issues/485#issuecomment-526158930 with 0 hits in the catalog will probably be fixed with https://github.com/hbz/lobid-resources/pull/1013.
Current state deployed to test: https://test.nwbib.de/spatial
+1
Will redeploy to test after https://github.com/hbz/lobid-resources/pull/1013 is deployed.
https://github.com/hbz/lobid-resources/pull/1013 is now deployed, please redeploy.
I still get a single missing entry with hits in the catalog: https://www.wikidata.org/wiki/Q11343
Is this due to a missing P131 (located in the administrative territorial entity)?
Is this due to a missing P131 (located in the administrative territorial entity)?
Added it: https://www.wikidata.org/w/index.php?title=Q11343&type=revision&diff=1007529025&oldid=993874056
Deployed to test: http://test.nwbib.de/spatial
Classification changes: https://github.com/hbz/lobid-vocabs/pull/97
It seems a previous workaround (for multiple P131, pick the last one) is not good enough: we're losing Regierungsbezirk Münster (https://www.wikidata.org/wiki/Q7920), probably due to its two P131 values. Note that it is missing in production too, probably because in master, multiple P131 are not yet handled at all (one of the original problems triggering this issue).
Fixed missing Regierungsbezirke on test and production with https://github.com/hbz/lobid-vocabs/commit/d304bacd482ef5bae695e6bc72e6a95f32fa6994:
https://test.nwbib.de/spatial https://nwbib.de/spatial
Remaining tasks here:
N9
like in https://github.com/hbz/lobid-vocabs/commit/d304bacd482ef5bae695e6bc72e6a95f32fa6994Re. Grafschaft Rietberg (Q457468): After starting to write an email to NWbib editors, I decided it only makes sense to keep it under N74 instead of moving it to "48 Niederrheinisch-Westfälischer Reichskreis". Let's tomorrow talk about how to implement it.
@acka47 I think we're done here, see https://github.com/hbz/nwbib/issues/485#issuecomment-527439605
Grafschaft Rietberg is in too, by overriding the SKOS data (2 broader values, not actually incorrect) with the info from non-90s-qids.json
(1 value) in the UI. The permanent real fix will come with #487.
As discussed offline, I rolled back overriding from non-90s-qids.json
to avoid duplicates.
We're still done here, as we will resolve Grafschaft Rietberg in #487.
Closing
As far as I can see, there are about 100 entities more in Wikidata with a NWBib ID (as of now 4421, see https://w.wiki/7aR) than entries in the NWBib spatial vocab (
$ grep "skos:Concept" nwbib-spatial.ttl | wc -l
results in 4324).We should investgate this. It should rather be the other way around with more
skos:Concept
s in nwbib-spatial than NWBib-ID entities in Wikidata as there are a few concepts without afoaf:focus
link to Wikidata.Trying to compare the WD entities with the command line,
srt
,uniq
andcomm
I did not come to a good results. @fsteeg, please let your comparison script run again.