Closed acka47 closed 7 years ago
We might consider aligning the RDF for pre-RDA and RDA records by removing "Formschlagwörter" from the subject array for pre-RDA records. See also https://github.com/hbz/lobid-rdf-to-json/issues/23#issuecomment-243483195.
As nobody asked for this, I'd say it is sufficient to do this in API 2.0. Thus, adding the label.
Here is the core list of GND Formschlagwörter: http://access.rdatoolkit.org/document.php?id=nlgpschp7&target=nlgps07-27
Here is the extended list with all GND Formschlagörter (PDF): https://wiki.dnb.de/download/attachments/106042227/AH-007.pdf
There is redundant information in MAB/Aleph fields 051
/052
. I wonder whether infromation in 051 is generated automatically from the 064 information or not (which would mean that it might even contradict each other). From the MAB documentation:
051 VEROEFFENTLICHUNGSSPEZIFISCHE ANGABEN ZU BEGRENZTEN
WERKEN
Indikator:
blank = nicht definiert
Datenelemente:
0 Erscheinungsform
a = unselbstaendig erschienenes Werk
f = Fortsetzung
m = einbaendiges Werk - nicht Teil eines
Gesamtwerks
n = mehrbaendiges begrenztes Werk - nicht Teil
eines Gesamtwerks
s = einbaendiges Werk u n d Teil (mit
Stuecktitel) eines Gesamtwerks
t = mehrbaendiges begrenztes Werk u n d
Teil (mit Stuecktitel) eines Gesamtwerks
1-3 Veroeffentlichungsart und Inhalt
a = Abstract (Referat)
b = Bibliographie
c = Katalog
d = Woerterbuch
e = Enzyklopaedie
f = Festschrift
g = Datenbank
h = Biographie
i = Registerwerk
j = Fortschrittsbericht
k = Konferenzschrift
l = Gesetz
m = Musikalia
n = Normschrift
o = Loseblattausgabe
p = Patentdokument
q = Lieferungswerk
r = Report
s = Statistik
t = Aufsatz
u = Universitaetsschrift
v = Sonderdruck
x = Schulbuch
z = sonstige Veroeffentlichungsart/-inhalt
Some examples to take a closer look at: http://lobid.org/hbz01/HT019025947, http://lobid.org/hbz01/HT019025943, http://lobid.org/hbz01/HT018814546, http://lobid.org/hbz01/HT018913029, http://lobid.org/hbz01/HT018909174
By testing I found out that fields 051/052 aren't automatically generated from 064. For the core Formschlagwörter I took the first five hits of lobid.org/resource and looked at them at lobid.org/hbz01. For Autobiografie, Bibliografie, Biografie, Comic, Festschrift, Hochschulschrift, Hörbuch, Schulbuch, Website and Zeitschrift I found no file with a field 064.
@ChristophEwertowski We have to check the RDA titles to see whether the 051/052 are automatically generated from 064. RDA are those with creation date after 2015-10-01. You can limit a query to those using the Elasticsearch query DSL, see https://www.elastic.co/guide/en/elasticsearch/reference/2.3/search-uri-request.html
@dr0i showed me how to limit the queries to those created after a specific point in time using the URL. E.g. http://lobid.org/resources?q=describedby.dateCreated:%3E20151001
I confined my search to October 2015 and onwards and looked at it again. There are still cases where 064 doesn't exist but 051 does, so for these cases 051 isn't automatically generated from 064. Example: http://lobid.org/hbz01/HT018979011
In other cases both fields exist but contain different information. Example: http://lobid.org/hbz01/HT018976920
<controlfield tag="051">at||||||</controlfield>
which means "unselbstaendig erschienenes Werk, Aufsatz".
<datafield tag="064" ind1="a" ind2="1">
<subfield code="a">Biografie</subfield>
<subfield code="9">(DE-588)4006804-3</subfield>
<subfield code="y">1921-1978</subfield>
</datafield>
Since a biography could also exist in other forms, e.g. books, for this case 051 couldn't be generated from 064.
The Formschlagwörter are still apart from the other keywords in field 064 and not 952 (see first post from Nov. 2015) (example: http://lobid.org/hbz01/HT019016389). So also they are not in subjectLabels (http://lobid.org/resource/HT019016389/about).
And if you look closer at the first example you can see that in the hbz01 file it's described as a newspaper (http://lobid.org/hbz01/HT017458093, field 064) and in the lobid-resource as a journal (http://lobid.org/resource/HT017458093, type:bibo/Journal) which are two different publication types.
And if you look closer at the first example you can see that in the hbz01 file it's described as a newspaper (http://lobid.org/hbz01/HT017458093, field 064) and in the lobid-resource as a journal (http://lobid.org/resource/HT017458093, type:bibo/Journal) which are two different publication types.
The example you point to has p
in 052 at position 0 which is – correctly – transformed to type "Journal". Thus, this rather seems a cataloging error to me.
Source data:
<controlfield tag="052">pag||||aw||||||</controlfield>
From the MAB documentation:
052 VEROEFFENTLICHUNGSSPEZIFISCHE ANGABEN ZU FORTLAUFENDEN
SAMMELWERKEN
Indikator:
blank = nicht definiert
Datenelemente:
0 Erscheinungsform
a = unselbstaendig erschienenes Werk
f = Fortsetzung
j = zeitschriftenartige Reihe
p = Zeitschrift
r = Schriftenreihe (Serie)
z = Zeitung
To get back, I sum up which points are open: Do we really need Formschlagwörter?
Are the fields 051/052 derived from 064 for RDA? (Probably not.) @acka47 which person would be the right contact person?
I'm going to tackle the first question by looking which and how much Formschlagwörter are already represented by mapping of 050-052.
R.D. (Edoweb) just asked for the 064 in an email:
wir bemerken eben erst, daß die Marc-Kat. 064 nicht in der Lobid-Schnittstelle und damit auch nicht ins Edoweb transportiert wird. Beisp.: Darin sind wichtige Informationen für die Sacherschließung. Können Sie sagen, ob das ein Versäumnis ist und ob man das nachholen kann?
Here is a link to the example from the screenshot: http://lobid.org/resources/HT019149667
I think it will be hard to align 064 ("Nature of Content"/"Art des Inhalts", see ) with the information we already have about a resource from other fields (inlcuding Formschlagwörter). Thus, it might be the easiest way to just add 064
independently to the RDF. The fitting property from the RDA registry is http://rdaregistry.info/Elements/u/P60584 "has nature of content". I couldn't find controlled vocabulary for the values. It looks like the controlled value list is DACH-specific and thus it's not surprising.
I couldn't find controlled vocabulary for the values.
As there are GND URIs given (I already linked to the PDF above that also lists the GND URIs), we will just use these along with the label given in subfield a
, e.g. for the example:
{
"@context":"http://lobid.org/resources/context.jsonld",
"id":"http://lobid.org/resources/HT019149667#!",
"natureOfContent":[
{
"id":"http://d-nb.info/gnd/4048476-2",
"label":"Ratgeber"
},
{
"id":"http://d-nb.info/gnd/4142300-8",
"label":"Amtliche Publikation"
}
]
}
NatureOfContent is added. Example (production) / example (test).
Looks good.+1
Deplyoed to prodcution, closing.
Sub-issue of #161. The "Formschlagwörter" are in field 064 in RDA instead of being listed with the other subject headings.
Examples
http://lobid.org/resource/HT017458093 which has Formschlagwort ""Zeitung" but isn't typed as such yet:
http://lobid.org/resource/HT018781721 (snippet) which has Formschlagwort "Zeitschrift" and is already typed as
bibo:Journal
:http://lobid.org/resource/HT018772904 (Formschlagwort "Bibliographische Reihe" and already typed as
bibo:Series
):