Closed dr0i closed 4 years ago
What to do with the
coverage
field, valued e.g."Köln | 99"
? See e.g. http://test.lobid.org/resources/HT015854197.json . This will be omitted going with 2) I think (at leas the snippet above doesn't reflect these entries, and I assume nobody will catalogue them anymore, right?)), but be necessary when going with 1) which we must use in parallel for at least some time.
Exctly, if entries start with https
, the coverage
field won't be necessary anymore. (That will also be a good way to check for already resources already updated in hbz01 which will then be those with _exists_:spatial AND NOT _exists_:coverage
.
Thus, we have to build a comprehensive tsv file covering all Wikidata entities needed so that we get data (coordinates & type) for the focus
object. There are three different source to gather the entries from:
Next steps: add the 33 QIDs from 3.) to the SPARQL query for 2.), adjust the SPARQL query for 1.) to be concatenated with the rest.
So I added the 34 (!) IDs from 3.) to the SPARQL query with eeade06, however, this will not be enough for adding the focus object. The reason is that something like this will be in the data: https://nwbib.de/spatial#n03
while the corresponding QID to derive the focus
information from is Q152243. We will have to think about a solution for this.
I also adjusted /src/main/resources/getNwbibSubjectLocationsAsWikidataEntities.sparql with 5492362. Now the results of both SPARQL queries can easily be concatenated for one TSV file to be used in the transformation process.
something like this will be in the data:
https://nwbib.de/spatial#n03
while the corresponding QID to derive thefocus
information from is Q152243. We will have to think about a solution for this.
I will add a map URI to QID so that @dr0i can look it up.
Here is the map. Note that notation 70
has two WD entities as focus
.
https://nwbib.de/spatial#N1 Q1198
https://nwbib.de/spatial#N3 Q152243
https://nwbib.de/spatial#N5 Q8614
https://nwbib.de/spatial#N10 Q462011
https://nwbib.de/spatial#N12 Q72931
https://nwbib.de/spatial#N13 Q2036208
https://nwbib.de/spatial#N14 Q4194
https://nwbib.de/spatial#N16 Q580471
https://nwbib.de/spatial#N18 Q881875
https://nwbib.de/spatial#N20 Q151993
https://nwbib.de/spatial#N22 Q153464
https://nwbib.de/spatial#N24 Q445609
https://nwbib.de/spatial#N28 Q152356
https://nwbib.de/spatial#N32 Q1380992
https://nwbib.de/spatial#N33 Q1381014
https://nwbib.de/spatial#N34 Q1413205
https://nwbib.de/spatial#N42 Q7904317
https://nwbib.de/spatial#N44 Q836937
https://nwbib.de/spatial#N45 Q641138
https://nwbib.de/spatial#N46 Q249428
https://nwbib.de/spatial#N47 Q152420
https://nwbib.de/spatial#N48 Q708742
https://nwbib.de/spatial#N57 Q698162
https://nwbib.de/spatial#N62 Q657241
https://nwbib.de/spatial#N63 Q649192
https://nwbib.de/spatial#N64 Q650645
https://nwbib.de/spatial#N65 Q697254
https://nwbib.de/spatial#N66 Q514557
https://nwbib.de/spatial#N68 Q700198
https://nwbib.de/spatial#N69 Q573290
https://nwbib.de/spatial#N70 Q14551680,Q835382
https://nwbib.de/spatial#N76 Q153943
https://nwbib.de/spatial#N77 Q829718
@acka47 I took your example MABXml into the test. This is the latest result: https://gist.github.com/dr0i/c649f05af0cc32aa5baec4b3c04871d2. Re: subjects - please have a look, I didn't do anything in the morph for this, but it looks good, doesn't it?
+1 Everything looks good. As discussed offline, we will have to think about the broader regions (e.g. Eifel, Weserbergland, or Nordrhein-Westfalen itself) that have one geo point attached. it does not make sense to use those geo coordinates for the "result map". So we have to think about a way to handle this (ignoring these coordinates or not storing them to begin with). For now, we will leave it as is.
Deployed to production, closed.
Reopening. From https://github.com/hbz/nwbib/issues/470#issuecomment-541045997:
Die Art der Speicherung von URI plus String in 700n wird ja nun anders ablaufen als in .https://github.com/hbz/nwbib/issues/470#issuecomment-483588151 dargestellt. Die endgültige Fassung mit Ablage von String und URI in unterschiedlichen Unterfeldern ist im Wiki dokumentiert, hier das Beispiel:
<datafield tag="700" ind1="n" ind2="1">
<subfield code="a">Ruhrgebiet</subfield>
<subfield code="0">https://nwbib.de/spatial#N20</subfield>
</datafield>
<datafield tag="700" ind1="n" ind2="1">
<subfield code="a">Duisburg</subfield>
<subfield code="0">https://nwbib.de/spatial#Q2100</subfield>
</datafield>
<datafield tag="700" ind1="n" ind2="1">
<subfield code="a">Essen</subfield>
<subfield code="0">https://nwbib.de/spatial#Q2066</subfield>
</datafield>
<datafield tag="700" ind1="n" ind2="1">
<subfield code="a">Einzelne Autoren (Primärliteratur)</subfield>
<subfield code="0">https://nwbib.de/subjects#N768010</subfield>
</datafield>
We will have to update the respective file in hbz01XmlClobs.tar.bz2 and adjust the morph accordingly. The new cataloging practice will begin by the end of November.
This should be resolved with #1036. Although there is no way of real testing, as there is only so small a designed test case, I am going to deploy it to not have any conflicts in the morph when other work is done there. Should be reopened when real data is coming in and behaves in a bad manner.
Result of hbz/nwbib#470:
New catalogued nwbib data will look like this:
The
spatial
entries in the jsonld will be generated in two ways:7001n1
doesn't start with "https:" lookup the literal in geo_nwbib and use the result to build the jsonld structure via ElasticsearchIndexer.java@acka47 What to do with the
coverage
field, valued e.g."Köln | 99"
? See e.g. http://test.lobid.org/resources/HT015854197.json . This will be omitted going with 2) I think (at leas the snippet above doesn't reflect these entries, and I assume nobody will catalogue them anymore, right?)), but be necessary when going with 1) which we must use in parallel for at least some time.