Closed fsteeg closed 7 years ago
For testing the quality of the JSON-LD output you should take a look at entities with geo coordinates (which are added via a bnode). For example http://d-nb.info/gnd/4074335-4 (ttl). See the issue at https://github.com/lobid/lodmill/issues/503.
First results, for http://d-nb.info/gnd/2047974-8/about/lds:
{
"@graph" : [ {
"@id" : "http://d-nb.info/gnd/2047974-8",
"@type" : "organisation",
"http://d-nb.info/standards/elementset/dnb#deprecatedUri" : [ "http://d-nb.info/gnd/4194078-7" ],
"http://d-nb.info/standards/elementset/gnd#broaderTermInstantial" : [ {
"@id" : "http://d-nb.info/gnd/4630294-3"
} ],
"http://d-nb.info/standards/elementset/gnd#geographicAreaCode" : [ {
"@id" : "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-DE-NW"
} ],
"http://d-nb.info/standards/elementset/gnd#gndIdentifier" : [ "2047974-8" ],
"http://d-nb.info/standards/elementset/gnd#gndSubjectCategory" : [ {
"@id" : "http://d-nb.info/standards/vocab/gnd/gnd-sc#6.7"
}, {
"@id" : "http://d-nb.info/standards/vocab/gnd/gnd-sc#2.2"
} ],
"homepage" : [ {
"@id" : "https://www.hbz-nrw.de/"
} ],
"http://d-nb.info/standards/elementset/gnd#oldAuthorityNumber" : [ "(DE-588)4194078-7", "(DE-588b)2047974-8", "(DE-588c)4194078-7" ],
"placeOfBusiness" : [ {
"@id" : "http://d-nb.info/gnd/4031483-2"
} ],
"preferredName:ForTheCorporateBody" : [ "Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen" ],
"http://d-nb.info/standards/elementset/gnd#spatialAreaOfActivity" : [ {
"@id" : "http://d-nb.info/gnd/4042570-8"
} ],
"topic" : [ {
"@id" : "http://d-nb.info/gnd/4132773-1"
} ],
"variantName:ForTheCorporateBody" : [ "Hochschulbibliothekszentrum NRW", "Hochschulbibliothekszentrum des Landes NRW", "Hochschulbibliothekszentrum", "hbz", "hbz Köln" ],
"http://www.w3.org/2002/07/owl#sameAs" : [ {
"@id" : "http://d-nb.info/gnd/4194078-7"
} ],
"url" : [ {
"@id" : "http://de.wikipedia.org/wiki/Hochschulbibliothekszentrum_des_Landes_Nordrhein-Westfalen"
} ]
} ]
}
For http://d-nb.info/gnd/4074335-4/about/lds:
{
"@graph" : [ {
"@id" : "_:t1",
"@type" : "http://www.opengis.net/ont/sf#Point",
"http://www.opengis.net/ont/geosparql#asWKT" : [ {
"@type" : "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value" : "Point ( -000.125740 +051.508530 )"
} ]
}, {
"@id" : "http://d-nb.info/gnd/4074335-4",
"@type" : "http://d-nb.info/standards/elementset/gnd#TerritorialCorporateBodyOrAdministrativeUnit",
"http://d-nb.info/standards/elementset/dnb#deprecatedUri" : [ "http://d-nb.info/gnd/1005809-6" ],
"http://d-nb.info/standards/elementset/gnd#definition" : [ {
"@language" : "de",
"@value" : "Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
} ],
"http://d-nb.info/standards/elementset/gnd#geographicAreaCode" : [ {
"@id" : "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-GB"
} ],
"http://d-nb.info/standards/elementset/gnd#gndIdentifier" : [ "4074335-4" ],
"homepage" : [ {
"@id" : "http://www.london.gov.uk"
} ],
"http://d-nb.info/standards/elementset/gnd#oldAuthorityNumber" : [ "(DE-588)1005809-6", "(DE-588b)1005809-6", "(DE-588c)4074335-4" ],
"preferredName:ForThePlaceOrGeographicName" : [ "London" ],
"http://d-nb.info/standards/elementset/gnd#relatedDdcWithDegreeOfDeterminacy4" : [ {
"@id" : "http://dewey.info/class/2--421/"
} ],
"variantName:ForThePlaceOrGeographicName" : [ "Londinum", "Londra", "Lundonia", "Augusta Trinobantum", "Westminster", "Lundun", "Landan", "Londyn", "Londres", "Londen", "London (Great Britain)", "Londinium" ],
"http://www.opengis.net/ont/geosparql#hasGeometry" : [ {
"@id" : "_:t1"
} ],
"http://www.w3.org/2002/07/owl#sameAs" : [ {
"@id" : "http://d-nb.info/gnd/1005809-6"
}, {
"@id" : "http://sws.geonames.org/2643743"
} ]
} ]
}
So the geo stuff is in there. However, we will need some post- and pre-processign to get the expected results.
In 1.0, we added some inferencing to get more general properties. I suggest doing similar things here:
preferredNameForThePlaceOrGeographicName
and variantNameForThePlaceOrGeographicName
. For all entities, we should just use preferredName
and variantName
.PlaceOrGeographicName
and AuthorityResource
.Having done 1.) and 2.), the result would look like this:
{
"@graph" : [ {
"@id" : "_:t1",
"@type" : "http://www.opengis.net/ont/sf#Point",
"http://www.opengis.net/ont/geosparql#asWKT" : [ {
"@type" : "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value" : "Point ( -000.125740 +051.508530 )"
} ]
}, {
"@id" : "http://d-nb.info/gnd/4074335-4",
"@type" : [ "http://d-nb.info/standards/elementset/gnd#TerritorialCorporateBodyOrAdministrativeUnit", "http://d-nb.info/standards/elementset/gnd#PlaceOrGeographicName", "http://d-nb.info/standards/elementset/gnd#AuthorityResource" ],
"http://d-nb.info/standards/elementset/dnb#deprecatedUri" : [ "http://d-nb.info/gnd/1005809-6" ],
"http://d-nb.info/standards/elementset/gnd#definition" : [ {
"@language" : "de",
"@value" : "Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
} ],
"http://d-nb.info/standards/elementset/gnd#geographicAreaCode" : [ {
"@id" : "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-GB"
} ],
"http://d-nb.info/standards/elementset/gnd#gndIdentifier" : [ "4074335-4" ],
"homepage" : [ {
"@id" : "http://www.london.gov.uk"
} ],
"http://d-nb.info/standards/elementset/gnd#oldAuthorityNumber" : [ "(DE-588)1005809-6", "(DE-588b)1005809-6", "(DE-588c)4074335-4" ],
"http://d-nb.info/standards/elementset/gnd#preferredName" : [ "London" ],
"http://d-nb.info/standards/elementset/gnd#relatedDdcWithDegreeOfDeterminacy4" : [ {
"@id" : "http://dewey.info/class/2--421/"
} ],
"http://d-nb.info/standards/elementset/gnd#variantName" : [ "Londinum", "Londra", "Lundonia", "Augusta Trinobantum", "Westminster", "Lundun", "Landan", "Londyn", "Londres", "Londen", "London (Great Britain)", "Londinium" ],
"http://www.opengis.net/ont/geosparql#hasGeometry" : [ {
"@id" : "_:t1"
} ],
"http://www.w3.org/2002/07/owl#sameAs" : [ {
"@id" : "http://d-nb.info/gnd/1005809-6"
}, {
"@id" : "http://sws.geonames.org/2643743"
} ]
} ]
}
The result of framing the above output (based on the to-be-added AuthorityResource
type) and adding the EntityFacts context can be viewed at http://tinyurl.com/y7n93utq. Obviously, this is not satsifying. For one, the EntityFacts context doesn't suffice and would have to be extended as it obviously doesn't cover the whole GND ontology. (EntityFacts os a simplification for use of GND by web developers). However, using our current context from 1.0 already looks much better, see http://tinyurl.com/ychm4t92. Thus, I suggest to just update this one.
Furthermore, the @graph
is still in there after framing and has to be removed by us. (It currently isn't possible to just leave it out but will be possible with the next JSON-LD version, see this thread on the liked-json mailing list and the issue resulting from the thread.)
I just found out that I already created a context for the 2.0 GND API, see https://github.com/hbz/lobid-gnd/issues/1. (We should probably delete this repo as soon as we have moved the issue over here.) This context is also missing some things (e.g. the geo properties), see http://tinyurl.com/y8z3f3rl.
Another option would be direct transformation from MARC-XML to JSON, like in lobid-organisations.
We could adapt the existing mappings for the RDF conversion: https://github.com/culturegraph/metafacture-examples/tree/master/Linked-Data-Service-Gnd
Re. the framing output from http://tinyurl.com/ychm4t92, I just noticed that blank nodes get an id
:
"hasGeometry": {
"@id": "_:b0",
"@type": "http://www.opengis.net/ont/sf#Point",
"asWKT": "Point ( -000.125740 +051.508530 )"
}
We should get rid of them. This has already been addressed in the JSON-LD Framing spec 1.1 ("pruneBlankNodeIdentifiers") but is currently only implemented in the Ruby library, see https://github.com/json-ld/json-ld.org/issues/293.
Input: http://d-nb.info/gnd/4074335-4/about/lds
Output:
{
"@id" : "http://d-nb.info/gnd/4074335-4",
"@type" : "TerritorialCorporateBodyOrAdministrativeUnit",
"http://d-nb.info/standards/elementset/dnb#deprecatedUri" : [ "http://d-nb.info/gnd/1005809-6" ],
"definition" : [ {
"@language" : "de",
"@value" : "Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
} ],
"geographicAreaCode" : [ "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-GB" ],
"gndIdentifier" : [ "4074335-4" ],
"homepage" : [ "http://www.london.gov.uk" ],
"oldAuthorityNumber" : [ "(DE-588)1005809-6", "(DE-588b)1005809-6", "(DE-588c)4074335-4" ],
"preferredNameForThePlaceOrGeographicName" : [ "London" ],
"relatedDdcWithDegreeOfDeterminacy4" : [ "http://dewey.info/class/2--421/" ],
"variantNameForThePlaceOrGeographicName" : [ "Londinum", "Londra", "Lundonia", "Augusta Trinobantum", "Westminster", "Lundun", "Landan", "Londyn", "Londres", "Londen", "London (Great Britain)", "Londinium" ],
"http://www.opengis.net/ont/geosparql#hasGeometry" : [ {
"@id" : "_:b0",
"@type" : "http://www.opengis.net/ont/sf#Point",
"http://www.opengis.net/ont/geosparql#asWKT" : [ {
"@type" : "http://www.opengis.net/ont/geosparql#wktLiteral",
"@value" : "Point ( -000.125740 +051.508530 )"
} ]
} ],
"sameAs" : [ "http://d-nb.info/gnd/1005809-6", "http://sws.geonames.org/2643743" ]
}
@acka47 Except for the points you already mentioned (missing keys in context, blank node IDs) this looks OK. Did I understand correctly: the idea is to add the http://d-nb.info/standards/elementset/gnd#AuthorityResource
type to all authorities?
Yes, this already looks quite good. And yes, as in 1.0 we should add type AuthorityResource
to all entitites.
Furthermore, we should have a type from the second level of GND ontology attached to each resource. We will need this for facetting. GND ontology has three levels in its type hierarchy (except for Person, where we have a fourth one added). see the overview over the GND class hierarchy at https://wiki1.hbz-nrw.de/x/CIeW. In the concrete example, PlaceOrGeographicName
should be in the data.
Regarding the name properties, we should only use preferredName
and variantName
for all entities. This will allow us to query the whole data in a uniform way. (The type is made clear by other means so that we don't need the specific properties.)
Deployed current state to: http://test.lobid.org/authorities
Our London example: http://test.lobid.org/authorities/4074335-4.json
@acka47 The context is used directly from GitHub, so you can edit on GitHub to test context tweaks: https://github.com/hbz/lobid-authorities/blob/master/conf/context.jsonld
(Context content is from https://gist.githubusercontent.com/acka47/98035a3f215c783bdc00/raw/5699ab4e89b5e7ab896ac69442c84fcf7f50ad66/gnd-context_20160126.jsonld)
Before working on the details (2nd level superclasses, rename fields, remove blank node IDs), I suggest we continue with testing the actual indexing of this format in Elasticsearch. I'd suggest we resolve this issue, and open new issues for the things I mentioned above. Assigning @acka47 for functional review.
I just noticed that the language isn't indicated as we do in other lobid services:
"definition":[
{
"@language":"de",
"@value":"Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
}
]
We would rather have "@container": "@language"
in the context and the following in the data:
"definition":[
{
"de":"Hauptstadt des Vereinigten Königreichs von Großbritannien und Nordirland, in Mittelsteinzeit besiedelt, 43 n.Chr. von Römern gegründet; das County of London war 1889-1965 Verwaltungsgrafschaft u. zeremonielle Grafschaft"
}
]
I updated the context accordingly but we will have to also take this into accoutn during transformation.
I updated the context accordingly but we will have to also take this into accoutn during transformation.
Looks fine already, thus nothing more to do. (also adjusted context for biographicalOrHistoricalInformation).
We will have to find out on what other properties language tags are used.
+1 Did some adjustments to the context and I am satisfied for now. Will open issues for the other things.
I don't think we need a separate beta/prod system yet, context is used from GitHub, so nothing to deploy, closing this.
Opened #5 for indexing.
Both dumps and updates (via OAI) are available as RDF-XML, so that would be a suitable source format:
http://datendienst.dnb.de/cgi-bin/mabit.pl?userID=opendata&pass=opendata&cmd=login http://www.dnb.de/DE/Service/DigitaleDienste/OAI/oai_node.html (s. "Formate")
We should test serializing that RDF-XML as compact JSON-LD using the entityfacts context:
http://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld http://hub.culturegraph.org/entityfacts/118540238
If the result looks good, this might be the format to index in Elasticsearch. We might have to do some preprocessing to make sure the values always have the same type (see footnote 1 in http://blog.lobid.org/2017/06/08/lobid-api-why-how.html about compact JSON-LD serialization in Elasticsearch).