Closed acka47 closed 9 years ago
Subject headings and preferred labels are in 902, alternate labels in 952. To find out of which type a GND entity is, you have to take a look at the indicator of 902. From the MAB documentation:
902 KETTENGLIED DER 1. SCHLAGWORTKETTE
Indikator:
p = Personenschlagwort
g = geographisch-ethnographisches Schlagwort
s = Sachschlagwort
k = Koerperschaftsschlagwort: Ansetzung unter dem
Individualnamen
c = Koerperschaftsschlagwort: Ansetzung unter dem
Ortssitz
z = Zeitschlagwort
f = Formschlagwort
t = Werktitel als Schlagwort
blank = Unterschlagwort einer Ansetzungskette
Example 1 (without contributor and with only one subject headings type): http://lobid.org/resource/HT010726584
Desired outcome is to have the preferred names as usual associated with the GND objects and the alternate names along witht eh prefered names in field subjectLabel
to allow querying by all labels:
{
"@graph" : [ {
"@id" : "http://d-nb.info/gnd/4046259-6",
"preferredName" : "Plasmaphysik",
"preferredNameForTheSubjectHeading" : "Plasmaphysik"
}, {
"@id" : "http://d-nb.info/gnd/4067488-5",
"preferredName" : "Zeitschrift",
"preferredNameForTheSubjectHeading" : "Zeitschrift"
}, {
"@id" : "http://d-nb.info/gnd/4511937-5",
"preferredName" : "Online-Publikation",
"preferredNameForTheSubjectHeading" : "Online-Publikation"
}, {
"@id" : "http://dewey.info/class/530/",
"prefLabel" : [ {
"@language" : "en",
"@value" : "Physics"
}, {
"@language" : "de",
"@value" : "Physik"
} ]
}, {
"@id" : "http://lobid.org/resource/HT010726584",
...
"subject" : [ "http://d-nb.info/gnd/4067488-5", "http://dewey.info/class/530/", "http://d-nb.info/gnd/4046259-6", "http://d-nb.info/gnd/4511937-5" ],
"subjectLabel" : [ "On-line-Dokument", "Online-Dokument", "On-line-Publikation", "Online-Ressource", "Computerdatei im Fernzugriff (Formschlagwort)", "Netzpublikation", "Zeitschriften", "Online-Datenbank (Formschlagwort)", "Periodikum", "On-line-Datenbank (Formschlagwort)" ],
...
} ]
...
}
Aleph XML (snippet):
...
<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Plasmaphysik</subfield>
<subfield code="9">(DE-588)4046259-6</subfield>undefined</datafield>undefined<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Zeitschrift</subfield>
<subfield code="9">(DE-588)4067488-5</subfield>undefined</datafield>undefined<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Online-Publikation</subfield>
<subfield code="9">(DE-588)4511937-5</subfield>undefined</datafield>
...
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Computerdatei im Fernzugriff</subfield>
<subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Online-Datenbank</subfield>
<subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Online-Dokument</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">On-line-Datenbank</subfield>
<subfield code="h">Formschlagwort</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">On-line-Dokument</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Online-Ressource</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">On-line-Publikation</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Netzpublikation</subfield>
</datafield>
The implementation looks quite straightforward. For subjectLabel
take all entries for 902 und 952, for preferredName
only take 902.
Example 2 (with corporate body as contribtuor and three different types of subject headings): http://lobid.org/resource/HT013077595/about
Desired outcome:
{
"@graph" : [ {
"@id" : "http://d-nb.info/gnd/109490312",
"preferredName" : "Boer, Hans-Peter",
},
"preferredNameForThePerson" : "Boer, Hans-Peter"
}, {
"@id" : "http://d-nb.info/gnd/11079267X",
"preferredName" : "Balke, Kirsten",
"preferredNameForThePerson" : "Balke, Kirsten"
}, {
"@id" : "http://d-nb.info/gnd/128755-2",
"preferredName" : "Kreisheimatverein <Coesfeld>",
"preferredNameForTheCorporateBody" : "Kreisheimatverein <Coesfeld>"
}, {
"@id" : "http://d-nb.info/gnd/4010355-9",
"preferredName" : "Coesfeld",
"preferredNameForThePlaceOrGeographicName" : "Coesfeld"
}, {
"@id" : "http://d-nb.info/gnd/4010356-0",
"preferredName" : "Kreis Coesfeld",
"preferredNameForThePlaceOrGeographicName" : "Kreis Coesfeld"
}, {
"@id" : "http://d-nb.info/gnd/4024116-6",
"preferredName" : "Heimatkundeunterricht",
"preferredNameForTheSubjectHeading" : "Heimatkundeunterricht"
}, {
"@id" : "http://lobid.org/resource/HT013077595",
"contributorLabel" : [ "Balke, Kirsten", "Boer, Hans Peter", "Boer, Hans-Peter" ],
"subjectLabel" : [ "Coesfeld. Hauptamt", "Landkreis Coesfeld", "Kreis Coesfeld. Kreistag", "Kreis Coesfeld. Hauptamt", "Kosfel'd", "Kreis Coesfeld. Oberkreisdirektor", "Coesfeld (Kreis)", "Kreis Coesfeld. Landrat", "Landrat (Kreis Coesfeld)", "Oberkreisdirektor (Kreis Coesfeld)", "Kreisverwaltung (Kreis Coesfeld)", "Kreistag (Kreis Coesfeld)", "Heimatkunde (Unterricht)", "Hauptamt (Kreis Coesfeld)", "Heimatkundedidaktik", "Stadtdirektor (Coesfeld)", "Pressestelle (Coesfeld)", "Hauptamt (Coesfeld)", "Coesfeld. Pressestelle", "Coesfeld. Stadtdirektor", "Heimatkunde / Didaktik", "Stadt Coesfeld", "Kreis Coesfeld. Kreisverwaltung" ],
"contributor" : [ "http://d-nb.info/gnd/11079267X", "http://d-nb.info/gnd/128755-2", "http://d-nb.info/gnd/109490312" ],
"subject" : [ "http://d-nb.info/gnd/4010355-9", "http://d-nb.info/gnd/4024116-6", "http://d-nb.info/gnd/4010356-0" ],
"subjectChain" : [ "Coesfeld | Heimatkundeunterricht | Lehrmittel", "Kreis Coesfeld | Heimatkundeunterricht | Lehrmittel (213)", "Kreis Coesfeld | Heimatkundeunterricht | Lehrmittel", "Coesfeld | Heimatkundeunterricht | Lehrmittel (213)" ],
...
}]
...
}
Source data (snippet):
<datafield tag="104" ind1="b" ind2="1">
<subfield code="p">Boer, Hans-Peter</subfield>
<subfield code="d">1949-</subfield>
<subfield code="b">[Red.]</subfield>
<subfield code="9">(DE-588)109490312</subfield>
</datafield>
<datafield tag="105" ind1="-" ind2="1">
<subfield code="p">Boer, Hans Peter</subfield>
<subfield code="d">1949-</subfield>
</datafield>
<datafield tag="200" ind1="b" ind2="1">
<subfield code="k">Kreisheimatverein</subfield>
<subfield code="h">Coesfeld</subfield>
<subfield code="9">(DE-588)128755-2</subfield>
</datafield>
<datafield tag="331" ind1="-" ind2="1">
<subfield code="a">Geschichte hier</subfield>
</datafield>
...
<datafield tag="902" ind1="-" ind2="1">
<subfield code="g">Coesfeld</subfield>
<subfield code="9">(DE-588)4010355-9</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Heimatkundeunterricht</subfield>
<subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
<subfield code="f">Lehrmittel</subfield>
</datafield>
...
<datafield tag="902" ind1="-" ind2="1">
<subfield code="s">Heimatkundeunterricht</subfield>
<subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="902" ind1="-" ind2="1">
<subfield code="f">Lehrmittel</subfield>
</datafield>
<datafield tag="903" ind1="-" ind2="1">
<subfield code="a">213</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
<subfield code="g">Kreis Coesfeld</subfield>
<subfield code="9">(DE-588)4010356-0</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
<subfield code="s">Heimatkundeunterricht</subfield>
<subfield code="9">(DE-588)4024116-6</subfield>
</datafield>
<datafield tag="907" ind1="-" ind2="1">
<subfield code="f">Lehrmittel</subfield>
</datafield>
<datafield tag="908" ind1="-" ind2="1">
<subfield code="a">213</subfield>
</datafield>
<controlfield tag="SYS">011404221</controlfield>
<datafield tag="LOW" ind1="-" ind2="1">
<subfield code="a">M0001</subfield>
</datafield>
<datafield tag="LOW" ind1="-" ind2="1">
<subfield code="a">M1168</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Coesfeld</subfield>
<subfield code="b">Hauptamt</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Hauptamt</subfield>
<subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Coesfeld</subfield>
<subfield code="b">Stadtdirektor</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Stadtdirektor</subfield>
<subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Coesfeld</subfield>
<subfield code="b">Pressestelle</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="k">Pressestelle</subfield>
<subfield code="h">Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="g">Kosfel'd</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="g">Stadt Coesfeld</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Heimatkunde</subfield>
<subfield code="h">Unterricht</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Heimatkunde</subfield>
<subfield code="x">Didaktik</subfield>
</datafield>
<datafield tag="952" ind1="-" ind2="1">
<subfield code="s">Heimatkundedidaktik</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreis Coesfeld</subfield>
<subfield code="b">Oberkreisdirektor</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Oberkreisdirektor</subfield>
<subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreis Coesfeld</subfield>
<subfield code="b">Kreistag</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreistag</subfield>
<subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreis Coesfeld</subfield>
<subfield code="b">Hauptamt</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Hauptamt</subfield>
<subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreis Coesfeld</subfield>
<subfield code="b">Landrat</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Landrat</subfield>
<subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreis Coesfeld</subfield>
<subfield code="b">Kreisverwaltung</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="k">Kreisverwaltung</subfield>
<subfield code="h">Kreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="g">Landkreis Coesfeld</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="g">Coesfeld</subfield>
<subfield code="h">Kreis</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="s">Heimatkunde</subfield>
<subfield code="h">Unterricht</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="s">Heimatkunde</subfield>
<subfield code="x">Didaktik</subfield>
</datafield>
<datafield tag="957" ind1="-" ind2="1">
<subfield code="s">Heimatkundedidaktik</subfield>
</datafield>
As we currently do, we should record the preferred Name in the RDF using both the general and the more specific property, e.g.:
"@id" : "http://d-nb.info/gnd/4076769-3",
"preferredName" : "Römerzeit",
"preferredNameForTheSubjectHeading" : "Römerzeit"
Mapping the subfields from https://github.com/hbz/lobid/issues/139#issuecomment-94394197 to RDF properties, respectively their JSON object keys:
p: preferredNameEntityForThePerson
g: preferredNameForThePlaceOrGeographicName
s: preferredNameForTheSubjectHeading
k: preferredNameForTheCorporateBody
c: :question:
z: No specific properties as these aren't GND entities, thus are not linked and only occur as part of a subject chain in RDF.
f: same as for z.
t: preferredNameForTheWork (:exclamation: We have to be careful here as subdfiled t co-occurs with subfield p, see e.g. http://lobid.org/resource?id=HT018312899&format=source. For the start, we should map to preferredNameForTheWork
if t occurs and prefix the creator name followed by colon and space (see e.g. http://193.30.112.134/F/?func=find-c&ccl_term=IDN%3DHT018312899 for implementation).
Regarding subfield c, can you point me to an example, @dr0i?
t: preferredNameForTheWork (:exclamation: We have to be careful here as subdfiled t co-occurs with subfield p, see e.g. http://lobid.org/resource?id=HT018312899&format=source. For the start, we should map to
preferredNameForTheWork
if t occurs and prefix the creator name followed by colon and space (see e.g. http://193.30.112.134/F/?func=find-c&ccl_term=IDN%3DHT018312899 for implementation).
At the NWBib meeting, customers asked for GND work titles having the author name in the label (see https://wiki1.hbz-nrw.de/x/DQBEB). Example: http://lobid.org/resource?id=HT018312899&format=full
Instead of:
{
"@id": "http://d-nb.info/gnd/7683386-0",
"preferredName": "Der Cid",
"preferredNameForTheWork": "Der Cid"
}
it should look like this:
{
"@id": "http://d-nb.info/gnd/7683386-0",
"preferredName": "Grabbe, Christian Dietrich: Der Cid",
"preferredNameForTheWork": "Der Cid"
}
Ready for testing. E.g. http://lobid.org/resource/HT007496264 vs http://test.lobid.org/resource/HT007496264 Transformation and indexing for all 20M docs (resulting in 66M docs) took 14h (formerly, with hadoop: 35h). Missing yet: enrichment with openlibrary, dbpedia and gutenberg. Made a ticket for this: lobid/lodmill/#667).
I believe that restricting the type of a resource is now broken, e.g. http://test.lobid.org/resource?name=Tom%2BSawyer&from=0&size=10&type=http%3A%2F%2Fpurl.org%2Fontology%2Fbibo%2FBook returns resoruces that are not bibo:Book (e.g. http://lobid.org/resource/HT016678345).
@literarymachine last commits (fixing the index config) seems to fix this problem. Test it, even better with the following API call which results in 15 hits: http://test.lobid.org/resource?name=Tom%2BSawyer%20detective&from=0&size=50&type=http%3A%2F%2Fpurl.org%2Fontology%2Fbibo%2FBook Yields the same results as the lobid productive.
edit dr0i: made a new issue hbz/lobid#150.
edit dr0i: put that comment into new issue hbz/lobid#150.
EDIT dr0i: made new issue #149.
Deployed to staging and production. @acka47 please have a look. Mind also comment in lobid/lodmill#669.
We can close this one as we have this in production and there probably only will be some minor adjustments in the future
Currently, we are enriching the title data with GND labels using hadoop job. There are at least two problems with this approach: #84 and one problem not documented appearing after the last morph adjustment.
To avoid these problems and reduce transformation time, we will get the labels directly out of the Aleph XML using morph.
Amongst others, we need to know: