hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
7 stars 7 forks source link

Add notations to spatial data #1017

Closed acka47 closed 4 years ago

acka47 commented 4 years ago

This is prerequisite to https://github.com/hbz/lobid-vocabs/issues/89 and https://github.com/hbz/lobid-vocabs/issues/93. We need to add notations – where available – for the spatial objects. The notations will be taken from Wikidata using the values of the properties P440 (Kreisschlüssel) and P439 (Amtlicher Gemeindeschlüssel).

I suggest to add it to the geo index like this:

{
   "focus":{
      "id":"http://www.wikidata.org/entity/Q365",
      "geo":{
         "lat":50.942222222222,
         "lon":6.9577777777778
      },
      "type":[
         "http://www.wikidata.org/entity/Q22865",
         "http://www.wikidata.org/entity/Q707813",
         "http://www.wikidata.org/entity/Q200250",
         "http://www.wikidata.org/entity/Q2202509",
         "http://www.wikidata.org/entity/Q42744322",
         "http://www.wikidata.org/entity/Q1549591"
      ]
   },
   "aliases":[
      {
         "language":"de",
         "value":"Kölle"
      }
   ],
   "id":"https://nwbib.de/spatial#Q365",
   "type":[
      "Concept"
   ],
   "label":"Köln",
   "notation":"05315000",
   "source":{
      "id":"https://nwbib.de/spatial",
      "label":"Raumsystematik der Nordrhein-Westfälischen Bibliographie"
   },
   "locatedIn":{
      "language":"de",
      "value":"Regierungsbezirk Köln"
   }
}

If I understand correctly, this object is included automatically in the lobid-resources data so that the extra notation field will automatically be added. We will not have to update the context as skos:notation is already in there:

"notation": {
  "@id": "http://www.w3.org/2004/02/skos/core#notation"
}

Additionally to those notations from Wikidata we will have to manually add notations for the Regierungsbezirke. This might be done by adding a small map to the morph, here as tsv:

Q7926 051
Q7927 053
Q896929  054
Q7920  055
Q7923 057
Q7924  059

Note to self: Ask NWBib editors re. Q313969 (Regierungsbezirk Minden) as they proposed the same notation (057) for it as for Regierungsbezirk Detmold.

acka47 commented 4 years ago

As discussed on the phone, we will use the SKOS file for loading the notations (which is the beast way to do it anyway as there are lots of entries in the spatial classification whose notation is not recorded on GitHub). I updated the SKOS file already, see https://github.com/hbz/lobid-vocabs/commit/d0b64aafab784245c726234308653f9ca74122f3

acka47 commented 4 years ago

As I am not sure we really need another issue for this, first a question: Is it possible to get notations for all spatial objects from the SKOS file, even for entries that are have not associated QID/are not in the geo index?

Background, we will have to replace the URI and notation for Westfalen (which is currently 05) by 04. (Similar with Euregio, from 91 to 07.)

Here is the spatial object for an example file HT020140915:

{
   "spatial":[
      {
         "id":"https://nwbib.de/spatial#N05",
         "type":[
            "Concept"
         ],
         "source":{
            "id":"https://nwbib.de/spatial",
            "label":"Raumsystematik der Nordrhein-Westfälischen Bibliographie"
         },
         "notation":"05",
         "label":"Westfalen"
      }
   ]
}

As said, all occurences of 05 have to be replaced by 04. Is this also possible looking up the SKOS file?

I wonder, how the spatial object is actually built as it can not be generated completely from 700n due to missing information there, see this snippet from the example's source:

<datafield tag="700" ind1="n" ind2="1">
   <subfield code="a">05</subfield>
</datafield>
acka47 commented 4 years ago

It looks like https://github.com/hbz/lobid-resources/blob/master/src/main/resources/nwbib-spatial.tsv can be replaced by looking up the SKOS file.

dr0i commented 4 years ago

Re "https://nwbib.de/spatial#N05": it actually has a QID mapped in the skos-file. This is also stated in https://github.com/hbz/lobid-resources/issues/998#issuecomment-495588258. However, for #n05 there was no focus yet as quoted in this ticket (https://github.com/hbz/lobid-resources/issues/1017#issuecomment-528809444). What is correct - #N05 (#N04 etc respectively) with focus or _without?

dr0i commented 4 years ago

Re last comment: This is true only for the test resource for it is based on data which is not yet in the hbz01 data:

https://nwbib.de/spatial#N05 Westfalen https://nwbib.de/spatial#Q2742 Münster

So, the outcome of the data described above (see here is only to be seen in the test data yet. As it was indeed there before I assume the outcome is correct.

acka47 commented 4 years ago

In the meantime, some notations and two foaf:focus statements where added to nwbib-spatial, see the last two commits at https://github.com/hbz/lobid-vocabs/commits/master.

dr0i commented 4 years ago

With #1029 the problem with focusmentioned in https://github.com/hbz/lobid-resources/issues/1017#issuecomment-529524340 seems to be resolved ,see e.g. http://stage.lobid.org/resources/search?q=spatial.focus.id%3A%22http%3A%2F%2Fwww.wikidata.org%2Fentity%2FQ8614%22 .

acka47 commented 4 years ago

Closing. I could not test this very good, though, as the notation field is not indexed, see https://github.com/hbz/lobid-resources/blob/050efda9b1d68e74a132933f2ea304346ca90296/src/main/resources/index-config.json#L840-L843